Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5udN-009Rce-1v for pgsql-hackers@arkaria.postgresql.org; Tue, 29 Oct 2024 22:24:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t5udK-003Wp6-K9 for pgsql-hackers@arkaria.postgresql.org; Tue, 29 Oct 2024 22:24:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5udJ-003Wok-RG for pgsql-hackers@lists.postgresql.org; Tue, 29 Oct 2024 22:24:14 +0000 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t5udG-003Wiu-4h for pgsql-hackers@postgresql.org; Tue, 29 Oct 2024 22:24:12 +0000 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-20c7b9087c4so3980895ad.1 for ; Tue, 29 Oct 2024 15:24:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=upgrade.com; s=google; t=1730240649; x=1730845449; darn=postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=R5DPYR1HFC6Y1cAkLLNvjZLKDJvHUQ3e46a6ZaAGRHU=; b=kAzI6laz97kEe2tgaRfupAZ5H7u5ekpAiQsl7LCs82BbgezhyAjXTywkvVx6O7afK3 C7lq9sw5Ubr+zg7OMX/6rS6Kf0o2aTQ7JcG0i/iK9lYRC57OcjxtCNOhFNDG+gsIqGMa 1ABBxoyEwz2lsOnjZ1MXtNYszwl7J4LR0K4z2iiI5Hf4K+fh/p/c57c2oTAViK2rDqrC RgARt+klZuNz5/6foS1fZW+dbBXBA/KNh8i4sMsvZGISatYil2HK+T63D8A6tOoyZBMu lvYsSoWVM1kvq2d7fmYen541ybiGkHjWGv55lr/XKCchRoiHw8IxjrJj8KigImMq8uuh gLFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730240649; x=1730845449; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R5DPYR1HFC6Y1cAkLLNvjZLKDJvHUQ3e46a6ZaAGRHU=; b=Gjpu4JhVV31vco0Rct11tnNguzXm1ORs02tUpEEq8RKgT9I/apycdO6gWUYx+DPDvk 1lOuOqC7crF88IRIXtSOC9Vef9dtZ9LYnXYvxS+KbrLB1LVhtJPTI0oRsMCQL58l2N0g ixVPFBXBqz7PVQfmvbBPb/faruwLuWlXixWjacLeDOpwwoaBz0//3//f2AwchfDg+zLD lcmCPjeoPof8pjwl5cYpAP+1tByfQ+F9F2ubtiinMqPOe4OoKmrdK0OgGvAB9uGecr+D 2I0tdn0sYqKdaGndblL7nRzIU2ZDsKQoIhHC1X0qrumFol32BtzrYmTo5RxMoTgB1+Pk n8Og== X-Forwarded-Encrypted: i=1; AJvYcCVfSz+B+OAAU+zwmgWKZWkPGEU8tYliqL0BFHG1Uiq/kGtllpzjShmcL3O2cKBlEJ1jolgnS6CSntBd/Kim@postgresql.org X-Gm-Message-State: AOJu0YxlG2FvUiHwiQJJdZnuERDdQIgcQwV/HPKNOVYsUJD1P/qS8I2r YHYWO7X7pazj+QigtHK7srV419ICNkXTNOr/F/Fg3k7npjBQSvm7GijQLwliNsg= X-Google-Smtp-Source: AGHT+IGD06tumBpWzt0fAATRAm+a7ISD60xhUbBUFZ58W+YMOXUO+f8svc6xEnoqFuHzrxEyFfv5dw== X-Received: by 2002:a17:902:c404:b0:20c:df08:9a78 with SMTP id d9443c01a7336-210c6c717e5mr80387295ad.13.1730240648734; Tue, 29 Oct 2024 15:24:08 -0700 (PDT) Received: from smtpclient.apple (syn-070-113-014-216.res.spectrum.com. [70.113.14.216]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-210bc045259sm70264815ad.266.2024.10.29.15.24.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Oct 2024 15:24:08 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51.11.1\)) Subject: Re: Vacuum statistics From: Jim Nasby In-Reply-To: <6732acf8ce0f31025b535ae1a64568750924a887.camel@moonset.ru> Date: Tue, 29 Oct 2024 17:23:55 -0500 Cc: Alexander Korotkov , Alena Rybakina , Masahiko Sawada , Melanie Plageman , jian he , pgsql-hackers , a.lepikhov@postgrespro.ru Content-Transfer-Encoding: quoted-printable Message-Id: <5AA8FFD5-6DE2-4A31-8E00-AE98F738F5D1@upgrade.com> References: <53c47c2d-72a5-44f2-900c-9973b2af1808@tantorlabs.com> <4a902cea-54fb-41b5-b208-b84731a5f577@postgrespro.ru> <092adec6-4eae-4bd4-bd0d-473a9df1282b@tantorlabs.com> <3deae1bd-ad84-4459-a26e-04c9136b84e9@postgrespro.ru> <9b10c6d3-52c4-4eef-b67c-c33442667729@postgrespro.ru> <9485d892-fd04-4e3a-ac24-7dd767cb7333@postgrespro.ru> <0B6CBF4C-CC2A-4200-9126-CE3A390D938B@upgrade.com> <6732acf8ce0f31025b535ae1a64568750924a887.camel@moonset.ru> To: Andrei Zubkov X-Mailer: Apple Mail (2.3776.700.51.11.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Oct 29, 2024, at 7:40=E2=80=AFAM, Andrei Zubkov = wrote: >=20 > Hi, >=20 > Thanks for your attention to our patch! >=20 > On Mon, 2024-10-28 at 16:03 -0500, Jim Nasby wrote: >>> Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost >>> tripled in space. That a huge change from having no statistics on >>> vacuum to have it in much more detail than everything else we >>> currently have. I think the feasible way might be to introduce >>> some >>> most demanded statistics first then see how it goes. >>=20 >> Looking at the stats I do think the WAL stats are probably not >> helpful. First, there=E2=80=99s nothing users can do to tune how much = WAL is >> generated by vacuum. Second, this introduces the risk of users saying >> =E2=80=9CWow, vacuum is creating a lot of WAL! I=E2=80=99m going to = turn it down!=E2=80=9D, >> which is most likely to make matters worse. There=E2=80=99s already a = lot of >> stuff that goes into WAL without any detailed logging; if we ever >> wanted to provide a comprehensive view of what data is in WAL that >> should be handled separately. >=20 > Yes, there is nothing we can directly do with WAL generated by vacuum, > but WAL generation is the part of vacuum work, and it will indirectly > affected by the changes of vacuum settings. So, WAL statistics is one > more dimension of vacuum workload. Also WAL stat is universal metric > which is measured cluster-wide and on the statement-level with > pg_stat_statements. Vacuum WAL counters will explain the part of > difference between those metrics. Besides vacuum WAL counters can be > used to locate abnormal vacuum behavior caused by a bug or the data > corruption. I think if the DBA is smart enough to look at vacuum WAL > generated stats and to understand what it means, the decision to > disable the autovacuum due to its WAL generation is unlikely. I=E2=80=99m generally for more stats rather than less - really just a = question of how much we=E2=80=99re worried about stats overhead. > Anyway I think some stats can be excluded to save some memory. The > first candidates are the system_time and user_time fields. Those are > very valuable, but are measured by the rusage stats, which won't be > available on all platforms. I think total_time and delay_time would be > sufficient. Yeah, I considered throwing those under the bus. I agree they=E2=80=99re = only marginally useful. > The second is the interrupts field. It is needed for monitoring to = know > do we have them or not, so tracking them on the database level will do > the trick. Interrupt is quite rare event, so once the monitoring = system > will catch one the DBA can go to the server log for the details. Just to confirm=E2=80=A6 by =E2=80=9Cinterrupt=E2=80=9D you mean vacuum = encountered an error? > It seems there is another way. If the vacuum stats doesn't seems to be > mandatory in all systems, maybe we should add some hooks to the vacuum > so that vacuum statistics tracking can be done in an extension. I = don't > think it is a good idea, because vacuum stats seems to me as mandatory > as the vacuum process itself. I=E2=80=99d actually like hooks for all stats, so people can develop = different ways of storing/aggregating them. But I agree that=E2=80=99s a = separate discussion. >> Is there a reason some fields are omitted >> from pg_stat_vacuum_database? While some stats are certainly more >> interesting at the per-relation level, I can=E2=80=99t really think = of any >> that don=E2=80=99t make sense at the database level as well. >=20 > Some of the metrics are table-specific, some index-specific, so we > moved to the database level metrics more or less specific to the whole > database. Can you tell what stats you want to see at the database > level? Here=E2=80=99s the thing with pg_stat_vacuum_database; it=E2=80=99s the = only way to see everything in the whole cluster. So I think the better = question is what metrics simply don=E2=80=99t make sense at that level? = And I don=E2=80=99t really see any that don=E2=80=99t. >> For all the views the docs should clarify that total_blks_written >> means blocks written by vacuum, as opposed to the background Ywriter. >=20 > We have the "Number of database blocks written by vacuum operations > performed on this table" in the docs now. Do you mean we should > specifically note the vacuum process here? The reason the stat is confusing is because it doesn=E2=80=99t have the = meaning that the name implies. Most people that see this will think = it=E2=80=99s actually measuring blocks dirtied, or at least something = closer to that. It definitely hides the fact that many of the dirtied = blocks could actually be written by the bgwriter. So an improvement to = the docs would be =E2=80=9CNumber of blocks written directly by vacuum = or auto vacuum. Blocks that are dirtied by a vacuum process can be = written out by another process.=E2=80=9D Which makes me realize=E2=80=A6 I think vacuum only counts a block as = dirtied if it was previously clean? If so the docs for that metric need = to clarify that vacuum might modify a block but not count it as having = been dirtied. >> Similarly they should clarify the difference between >> rel_blks_(read|hit) and total_blks_(read|hit). In the case of >> pg_stat_vacuum_indexes it=E2=80=99d be better if rel_blks_(read|hit) = were >> called index_blks_(read|hit). Although=E2=80=A6 if total_blks_* is = actually >> the count across the table and all the indexes I don=E2=80=99t know = that we >> even need that counter. I realize that not ever vacuum even looks at >> the indexes, but if we=E2=80=99re going to go into that level of = detail then >> we would (at minimum) need to count the number of times a vacuum >> completely skipped scanning the indexes. >=20 > It is not clear to me enough. The stats described just as it is - > rel_blocks_* tracks blocks of the current heap, and total_* is for the > whole database blocks - not just tables and indexes, vacuum do some > work (quite a little) in the catalog and this work is counted here = too. > Usually this stat won't be helpful, but maybe we can catch unusual > vacuum behavior using this stat. Ok, so this just needs to be clarified in the docs by explicitly stating = what is and isn=E2=80=99t part of the metric. It would also be better = not to use the term =E2=80=9Crel=E2=80=9D since most people don=E2=80=99t = immediately know what that means. So, table_blks_(read|hit) or = index_blks_(read|hit). Also, =E2=80=9Ctotal=E2=80=9D is still not clear to me, at least in the = context of pg_stat_vacuum_indexes. Is that different from = pg_stat_vacuum_tables.total_blks_*? If so, how? If it=E2=80=99s the same = then IMO it should just be removed from pg_stat_vacuum_indexes. >> Sadly index_vacuum_count is may not useful at all at present. At >> minimum you=E2=80=99d need to know the number of times vacuum had run = in >> total. I realize that=E2=80=99s in pg_stat_all_tables, but that = doesn=E2=80=99t help >> if vacuum stats are tracked or reset separately. >=20 > I'm in doubt - is it really possible to reset the vacuum stats > independent of pg_stat_all_tables? Most stats can be independently reset, so I was thinking these = wouldn=E2=80=99t be an exception. If that=E2=80=99s not the case then I = think the docs need to mention pg_stat_all_tables.(auto)vacuum_count, = since it=E2=80=99s in a completely different view. Or better yet, = include the vacuum/analyze related stats that are in pg_stat_all_tables = in pg_stat_vacuum_tables. BTW, have you thought about what stats should be added for ANALYZE? = That=E2=80=99s obviously not as critical as vacuum, but maybe worth = considering as part of this... >> First, there=E2=80=99s still gaps in trying to track HOT; most = notably a >> counter for how many updates would never be HOT eligible because they >> modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really >> limited without that info. >=20 > Nice catch, I'll think about it. Those are not directly connected to > the vacuum workload but those are important. Just to re-iterate: I don=E2=80=99t think this patch has to boil the = ocean and try to handle all these extra use cases. >> There should also be stats about unused line pointers - in degenerate >> cases the lp array can consume a significant portion of heap storage. >>=20 >> Monitoring bloat would be a lot more accurate if vacuum reported >> total tuple length for each run along with the total number of tuples >> it looked at. Having that info would make it trivial to calculate >> average tuple size, which could then be applied to reltuples and >> relpages to calculate how much space would being lost to bloat. >=20 > Yes, bloat tracking is in our plans. Right now it is not clear enough > how to do it in the most reliable and convenient way. >=20 >> Autovacuum will self-terminate if it would block another process >> (unless it=E2=80=99s an aggressive vacuum) - that=E2=80=99s = definitely something that >> should be tracked. Not just the number of times that happens, but >> also stats about how much work was lost because of this. >=20 > Agreed. >=20 >> Shrinking a relation (what vacuum calls truncation, which is very >> confusing with the truncate command) is a rather complex process that >> currently has no visibility. >=20 > In this patch table truncation can be seen in the "pages_removed" = field > of "pg_stat_vacuum_tables" at least as the cumulative number of = removed > pages. It is not clear enough, but it is visible. Ahh, good point. I think it=E2=80=99s probably worth adding a counter = (to this patch) for how many times vacuum actually decided to do page = removal, because it=E2=80=99s (presumably) a pretty rare event. Without = that counter it=E2=80=99s very hard to make any sense of the number of = pages removed (other than being able to see some were removed, at least = once). >> Tuning vacuum_freeze_min_age (and the MXID variant) is rather >> complicated. We maybe have enough stats on whether it could be set >> lower, but there=E2=80=99s no visibility on how the settings affect = how often >> vacuum decides to be aggressive. At minimum, we should have stats on >> when vacuum is aggressive, especially since it significantly changes >> the behavior of autovac. >=20 > When you say "agressive" do you mean the number of times when the > vacuum was processing the table with the FREEZE intention? I think = this > is needed too. Yes. I intentionally use the term =E2=80=9Caggressive=E2=80=9D (as the = code does) to avoid confusion with the FREEZE option (which as I=E2=80=99m= sure you know simply forces some GUCs to 0). Further complicating this = is that auto vac will report this as =E2=80=9Cto prevent = wraparound=E2=80=9D=E2=80=A6 In any case=E2=80=A6 I=E2=80=99m actually leaning towards there should = be a complete second set of counters for aggressive vacuums, because of = how differently they work. :( >> I saw someone else already mentioned tuning vacuum memory usage, but >> I=E2=80=99ll mention it again. Even if the issues with = index_vacuum_count are >> fixed that still only tells you if you have a problem; it doesn=E2=80=99= t >> give you a great idea of how much more memory you need. The best you >> can do is assuming you need (number of passes - 1) * current memory. >=20 > Do you think such approach is insufficient? It seems we do not need > byte-to-byte accuracy here. Byte-for-byte, no. But I do wonder if there=E2=80=99s any way to do = better than some multiple of what *_work_mem was set to. And setting that aside, another significant problem is that you can=E2=80=99= t actually do anything here without actually knowing what memory setting = was used, which is definitely not a given. Off-hand I don=E2=80=99t see = anyway this can actually be tuned (at all) with nothing but counters. :( Definitely out of scope for this patch though :) >> Speaking of which=E2=80=A6 there should be stats on any time vacuum = decided >> on it=E2=80=99s own to skip index processing due to wraparound = proximity. >=20 > Maybe we should just count the number of times when the vacuum was > started to prevent wraparound? Unfortunately even that isn=E2=80=99t simple=E2=80=A6 auto vac and = manual vac have different GUCs, and of course there=E2=80=99s the FREEZE = option. And then there=E2=80=99s the issue that MXIDs are handled = completely separately. Even ignoring all of that=E2=80=A6 by default an aggressive vacuum = won=E2=80=99t skip indexes. That only happens when you hit = vacuum_(multixact_)failsafe_age. BTW, something I=E2=80=99ve been mulling over is what stats related to = cleanup might be tracked at a system level. I=E2=80=99m thinking along = the lines of how often heap_prune_page or the index marking code come = across a dead tuple they can=E2=80=99t do anything about yet because = it=E2=80=99s still visible. While you could track that per-relation, = I=E2=80=99m not sure how helpful that actually is since it=E2=80=99s = really a long-running transaction problem. Similarly, it=E2=80=99d be nice if we had stats about how often all of = the auto vac workers were occupied; something that=E2=80=99s also global = in nature.