Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5Wuy-006kMX-27 for pgsql-hackers@arkaria.postgresql.org; Mon, 28 Oct 2024 21:04:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t5Wtw-009mpU-OY for pgsql-hackers@arkaria.postgresql.org; Mon, 28 Oct 2024 21:03:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5Wtw-009mpL-As for pgsql-hackers@lists.postgresql.org; Mon, 28 Oct 2024 21:03:48 +0000 Received: from mail-pj1-x102b.google.com ([2607:f8b0:4864:20::102b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t5Wtt-003Tkr-NE for pgsql-hackers@postgresql.org; Mon, 28 Oct 2024 21:03:47 +0000 Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-2e2d83f15f3so635836a91.0 for ; Mon, 28 Oct 2024 14:03:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=upgrade.com; s=google; t=1730149423; x=1730754223; darn=postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=10W0r/ALjOjTMtOc9arANu2ob1pI8/ilILB4PRkhLdQ=; b=G35O2ERn3Nm4/f2vHhcfVcFLqRQw1BHUnAj0FnBIE1+KRz5ReM1HEuV2pJa0WJOaT7 DAsUmM5o96/Q3FNJSJd9eCjHh/OXZaFjCalZ39EmfiK14e8573HSSfzcH5fngik5qvfj Kl8jeFxjFyV+bYUwjkdBsQu2P8PfyDqDU1TKBwbKCNMAtpBfmJMeBN4uWOtULxy+wuZk IsCnUFBCbWibV3g2Rfl2YfZCUpO9GjQWT9+1BkGHihlnFrKkASsoZK5TMDRA7dIlTyTI cLCYykzwaY+VYyJaXOKEQ8ix744FAGx5GstGB+gHpfzq5+gzAm3G6FBZJoG6ZlcZxdSl cFdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730149423; x=1730754223; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=10W0r/ALjOjTMtOc9arANu2ob1pI8/ilILB4PRkhLdQ=; b=MdUNOOKl+2Fnf3+wocBVWk2YDleESMor8kY/OEHC9SzZnraVEllUPR8YtjFisvYWCy l5fXsqlKDfTzYNo6s8q7hDj3iHuAe232f/2DrFd+esM0JpaYsGbaE04GAmWzjwlJ3EUc VHo9raVqKlY5DU0rq0w/NwpqMPBf5sn5N08ZqopZzeMz6X6TPnRyHW5bd4O4DF9hMfTY OMqNU43/btTIKNxiszdpyP1zi4HD4P73FzEibplOk5kIhfhHNEhpbljuVjCEpwdWPm9O 0lXs0R/DZebJ19J5leyi4hJMFEK02UjZBAY34jhQNbKPCf3t5Zg1OlTLv+blz0nKn1/1 KI6g== X-Forwarded-Encrypted: i=1; AJvYcCWcqO4z3rIVpMYdQWpvzs6IstwBHW3kOzX+zi92m2n1GC0LWIMkiXZxD4kqGQU51nXMfRXE9UwZ9mkwWV6Y@postgresql.org X-Gm-Message-State: AOJu0Yxn4oRd9Ici33JRg+OsgGBB999epz0Zlww6WVcyJKD9nEJbSrNN ei/cYOIoYuRzxgUqRRZoNa3nVSjqdnGkpdSCsfVnYYE5yU7i3fLUbQCnGH83rjc= X-Google-Smtp-Source: AGHT+IGDhkzntpEWdHRbyVuyQSXVHqmIkg25P+bxqAJbQy4cGcoCT9bVpV3NaRqBnwez4ecvzqkZqA== X-Received: by 2002:a17:90a:4bc5:b0:2e2:c04b:da94 with SMTP id 98e67ed59e1d1-2e8f11a71b2mr4764612a91.5.1730149423119; Mon, 28 Oct 2024 14:03:43 -0700 (PDT) Received: from smtpclient.apple (syn-070-113-014-216.res.spectrum.com. [70.113.14.216]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e77e534f09sm9833561a91.32.2024.10.28.14.03.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Oct 2024 14:03:42 -0700 (PDT) From: Jim Nasby Message-Id: <0B6CBF4C-CC2A-4200-9126-CE3A390D938B@upgrade.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_8359B6D1-0916-4124-8EEC-65C9700F8F01" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51.11.1\)) Subject: Re: Vacuum statistics Date: Mon, 28 Oct 2024 16:03:30 -0500 In-Reply-To: Cc: Alena Rybakina , Masahiko Sawada , Melanie Plageman , Andrei Zubkov , jian he , Ilia Evdokimov , Alena Rybakina , pgsql-hackers , a.lepikhov@postgrespro.ru To: Alexander Korotkov References: <53c47c2d-72a5-44f2-900c-9973b2af1808@tantorlabs.com> <4a902cea-54fb-41b5-b208-b84731a5f577@postgrespro.ru> <092adec6-4eae-4bd4-bd0d-473a9df1282b@tantorlabs.com> <3deae1bd-ad84-4459-a26e-04c9136b84e9@postgrespro.ru> <9b10c6d3-52c4-4eef-b67c-c33442667729@postgrespro.ru> <9485d892-fd04-4e3a-ac24-7dd767cb7333@postgrespro.ru> X-Mailer: Apple Mail (2.3776.700.51.11.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_8359B6D1-0916-4124-8EEC-65C9700F8F01 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Oct 28, 2024, at 2:07=E2=80=AFPM, Alexander Korotkov = wrote: >=20 >> I suppose that if we turn off statistics collection for a certain = object, we can miss it. In addition, the user may not enable the = parameter for the object in time, because he will forget about it. >=20 > I agree with this point. Additionally, in order to benefit from > gatherting vacuum statistics only for some relations in terms of > space, we need to handle variable-size stat entries. That would > greatly increase the complexity. Could vacuum stats be treated as a separate category instead of adding = it to PgStat_TableCounts? >> As for the second option, now I cannot say which statistics can be = removed, to be honest. So far, they all seem necessary. >=20 > Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost > tripled in space. That a huge change from having no statistics on > vacuum to have it in much more detail than everything else we > currently have. I think the feasible way might be to introduce some > most demanded statistics first then see how it goes. Looking at the stats I do think the WAL stats are probably not helpful. = First, there=E2=80=99s nothing users can do to tune how much WAL is = generated by vacuum. Second, this introduces the risk of users saying = =E2=80=9CWow, vacuum is creating a lot of WAL! I=E2=80=99m going to turn = it down!=E2=80=9D, which is most likely to make matters worse. There=E2=80= =99s already a lot of stuff that goes into WAL without any detailed = logging; if we ever wanted to provide a comprehensive view of what data = is in WAL that should be handled separately. The rest of the stats all look important. In fact, I think there=E2=80=99s= even more stats that could be included (such as all frozen/visible = pages skipped) - even more reason to look at having separate controls = for tracking vacuum stats. There=E2=80=99s also an argument to be made = for tracking autovac separately from manual vacuum. So long-term we = might want to look at other ways to handle these stats, not only because = of the large number of stats, but because they would be updated very = infrequently compared to other stats counters. Ironically, the old stats = system would probably have been more than sufficient for these stats. = Tracking them in a real table might also be an option. Is there a reason some fields are omitted from pg_stat_vacuum_database? = While some stats are certainly more interesting at the per-relation = level, I can=E2=80=99t really think of any that don=E2=80=99t make sense = at the database level as well. Looking at the per table/index stats, I strongly dislike the use of the = term =E2=80=9Cdelete=E2=80=9D - it is a recipe for confusion with row = deletion.. A much better term is =E2=80=9Cremove=E2=80=9D or = =E2=80=9Cremoved=E2=80=9D. I realize the term =E2=80=9Cdelete=E2=80=9D = is used in places in vacuum logging, but IMO we should fix that as well = instead of doubling-down on it. I think =E2=80=9Cinterrupts=E2=80=9D is also a very confusing name - = those fields should just be called =E2=80=9Cerrors=E2=80=9D. I realize =E2=80=9Crelname=E2=80=9D is being used for consistency with = pg_stat_all_(tables|indexes), but I=E2=80=99m not sure it makes sense to = double-down on that. Especially in pg_stat_vacuum_indexes, where it=E2=80=99= s not completely clear whether relname is referring to the table or the = index. I=E2=80=99m also inclined to say that the name of the table = should be included in pg_stat_vacuum_indexes. For all the views the docs should clarify that total_blks_written means = blocks written by vacuum, as opposed to the background writer. Similarly = they should clarify the difference between rel_blks_(read|hit) and = total_blks_(read|hit). In the case of pg_stat_vacuum_indexes it=E2=80=99d = be better if rel_blks_(read|hit) were called index_blks_(read|hit). = Although=E2=80=A6 if total_blks_* is actually the count across the table = and all the indexes I don=E2=80=99t know that we even need that counter. = I realize that not ever vacuum even looks at the indexes, but if we=E2=80=99= re going to go into that level of detail then we would (at minimum) need = to count the number of times a vacuum completely skipped scanning the = indexes. Having rev_all_(frozen|visible)_pages in the same view as vacuum stats = will confuse users into thinking that vacuum is clearing the bits. Those = fields really belong in pg_stat_all_tables. Sadly index_vacuum_count is may not useful at all at present. At minimum = you=E2=80=99d need to know the number of times vacuum had run in total. = I realize that=E2=80=99s in pg_stat_all_tables, but that doesn=E2=80=99t = help if vacuum stats are tracked or reset separately. At minimum the = docs should mention them. They also need to clarify if = index_vacuum_count is incremented per-index or per-pass (hopefully the = later). Assuming it=E2=80=99s per-pass, a better name for the field = would be index_vacuum_passes, index_passes, index_pass_count, or = similar. But even with that we still need a counter for the number of = vacuums where index processing was skipped. Other items First, thanks to everyone that=E2=80=99s put work into this patch - = it=E2=80=99s a big step forward. I certainly don=E2=80=99t want the = perfect to be the enemy of the good, but since the size of these stats = entries has already come up as a concern I want to consider use cases = that would still not be covered by this patch. I=E2=80=99m not = suggesting these need to be added now, but IMHO they=E2=80=99re logical = next steps (that would also mean more counters). The cases below would = probably mean at least doubling the number of vacuum-related counters, = at least at the table level. First, there=E2=80=99s still gaps in trying to track HOT; most notably a = counter for how many updates would never be HOT eligible because they = modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really limited = without that info. There should also be stats about unused line pointers - in degenerate = cases the lp array can consume a significant portion of heap storage. Monitoring bloat would be a lot more accurate if vacuum reported total = tuple length for each run along with the total number of tuples it = looked at. Having that info would make it trivial to calculate average = tuple size, which could then be applied to reltuples and relpages to = calculate how much space would being lost to bloat. Autovacuum will self-terminate if it would block another process (unless = it=E2=80=99s an aggressive vacuum) - that=E2=80=99s definitely something = that should be tracked. Not just the number of times that happens, but = also stats about how much work was lost because of this. Shrinking a relation (what vacuum calls truncation, which is very = confusing with the truncate command) is a rather complex process that = currently has no visibility. Tuning vacuum_freeze_min_age (and the MXID variant) is rather = complicated. We maybe have enough stats on whether it could be set = lower, but there=E2=80=99s no visibility on how the settings affect how = often vacuum decides to be aggressive. At minimum, we should have stats = on when vacuum is aggressive, especially since it significantly changes = the behavior of autovac. I saw someone else already mentioned tuning vacuum memory usage, but = I=E2=80=99ll mention it again. Even if the issues with = index_vacuum_count are fixed that still only tells you if you have a = problem; it doesn=E2=80=99t give you a great idea of how much more = memory you need. The best you can do is assuming you need (number of = passes - 1) * current memory. Speaking of which=E2=80=A6 there should be stats on any time vacuum = decided on it=E2=80=99s own to skip index processing due to wraparound = proximity. I=E2=80=99m sure there=E2=80=99s some other use cases that I=E2=80=99m = not thinking of.= --Apple-Mail=_8359B6D1-0916-4124-8EEC-65C9700F8F01 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Oct 28, 2024, at 2:07=E2=80=AFPM, Alexander = Korotkov <aekorotkov@gmail.com> wrote:

I suppose that if we turn off statistics = collection for a certain object, we can miss it. In addition, the user = may not enable the parameter for the object in time, because he will = forget about it.

I agree = with this point.  Additionally, in order to benefit from
gatherting vacuum statistics only for some = relations in terms of
space, = we need to handle variable-size stat entries.  That would
greatly increase the complexity.

Could vacuum stats be = treated as a separate category instead of adding it to = PgStat_TableCounts?

As for the second option, now I cannot say which = statistics can be removed, to be honest. So far, they all seem = necessary.

Yes, = but as Masahiko-san pointed out, PgStat_TableCounts is almost
tripled in space.  That a huge change = from having no statistics on
vacuum = to have it in much more detail than everything else we
currently have.  I think the feasible = way might be to introduce some
most = demanded statistics first then see how it goes.

Looking at the stats I do = think the WAL stats are probably not helpful. First, there=E2=80=99s = nothing users can do to tune how much WAL is generated by vacuum. = Second, this introduces the risk of users saying =E2=80=9CWow, vacuum is = creating a lot of WAL! I=E2=80=99m going to turn it down!=E2=80=9D, = which is most likely to make matters worse. There=E2=80=99s already a = lot of stuff that goes into WAL without any detailed logging; if we ever = wanted to provide a comprehensive view of what data is in WAL that = should be handled separately.

The rest of the = stats all look important. In fact, I think there=E2=80=99s even more = stats that could be included (such as all frozen/visible pages skipped) = - even more reason to look at having separate controls for tracking = vacuum stats. There=E2=80=99s also an argument to be made for tracking = autovac separately from manual vacuum. So long-term we might want to = look at other ways to handle these stats, not only because of the large = number of stats, but because they would be updated very infrequently = compared to other stats counters. Ironically, the old stats system would = probably have been more than sufficient for these stats. Tracking them = in a real table might also be an option.

Is there a reason some fields are = omitted from pg_stat_vacuum_database? While some stats are certainly more = interesting at the per-relation level, I can=E2=80=99t really think of = any that don=E2=80=99t make sense at the database level as = well.

Looking at the per = table/index stats, I strongly dislike the use of the term =E2=80=9Cdelete=E2= =80=9D - it is a recipe for confusion with row deletion.. A much better = term is =E2=80=9Cremove=E2=80=9D or =E2=80=9Cremoved=E2=80=9D. I realize = the term =E2=80=9Cdelete=E2=80=9D is used in places in vacuum logging, = but IMO we should fix that as well instead of doubling-down on = it.

I think =E2=80=9Cinterrupts=E2=80=9D is also a very = confusing name - those fields should just be called = =E2=80=9Cerrors=E2=80=9D.

I realize =E2=80=9Crelname=E2=80=9D is being used for consistency = with pg_stat_all_(tables|indexes), but I=E2=80=99m not sure it makes = sense to double-down on that. Especially in pg_stat_vacuum_indexes, = where it=E2=80=99s not completely clear whether relname is referring to = the table or the index. I=E2=80=99m also inclined to say that the name = of the table should be included in = pg_stat_vacuum_indexes.

For all the views the docs should clarify that total_blks_written = means blocks written by vacuum, as opposed to the background writer. = Similarly they should clarify the difference between rel_blks_(read|hit) = and total_blks_(read|hit). In the case of pg_stat_vacuum_indexes it=E2=80=99= d be better if rel_blks_(read|hit) were called index_blks_(read|hit). = Although=E2=80=A6 if total_blks_* is actually the count across the table = and all the indexes I don=E2=80=99t know that we even need that counter. = I realize that not ever vacuum even looks at the indexes, but if we=E2=80=99= re going to go into that level of detail then we would (at minimum) need = to count the number of times a vacuum completely skipped scanning the = indexes.

Having = rev_all_(frozen|visible)_pages in the same view as vacuum stats will = confuse users into thinking that vacuum is clearing the bits. Those = fields really belong in pg_stat_all_tables.

Sadly index_vacuum_count is = may not useful at all at present. At minimum you=E2=80=99d need to know = the number of times vacuum had run in total. I realize that=E2=80=99s in = pg_stat_all_tables, but that doesn=E2=80=99t help if vacuum stats are = tracked or reset separately. At minimum the docs should mention them. = They also need to clarify if index_vacuum_count is incremented = per-index or per-pass (hopefully the later). Assuming it=E2=80=99s = per-pass, a better name for the field would be index_vacuum_passes, = index_passes, index_pass_count, or similar. But even with that we still = need a counter for the number of vacuums where index processing was = skipped.

Other items
First, thanks to everyone that=E2=80=99s put work into = this patch - it=E2=80=99s a big step forward. I certainly don=E2=80=99t = want the perfect to be the enemy of the good, but since the size of = these stats entries has already come up as a concern I want to consider = use cases that would still not be covered by this patch. I=E2=80=99m not = suggesting these need to be added now, but IMHO they=E2=80=99re logical = next steps (that would also mean more counters). The cases below would = probably mean at least doubling the number of vacuum-related counters, = at least at the table level.

First, there=E2=80=99s= still gaps in trying to track HOT; most notably a counter for how many = updates would never be HOT eligible because they modify indexes. = pg_stat_all_tables.n_tup_newpage_upd is really limited without that = info.

There should also be stats about unused = line pointers - in degenerate cases the lp array can consume a = significant portion of heap storage.

Monitoring = bloat would be a lot more accurate if vacuum reported total tuple length = for each run along with the total number of tuples it looked at. Having = that info would make it trivial to calculate average tuple size, which = could then be applied to reltuples and relpages to calculate how much = space would being lost to bloat.

Autovacuum will = self-terminate if it would block another process (unless it=E2=80=99s an = aggressive vacuum) - that=E2=80=99s definitely something that should be = tracked. Not just the number of times that happens, but also stats about = how much work was lost because of this.

Shrinking a = relation (what vacuum calls truncation, which is very confusing with the = truncate command) is a rather complex process that currently has no = visibility.

Tuning vacuum_freeze_min_age (and = the MXID variant) is rather complicated. We maybe have enough stats on = whether it could be set lower, but there=E2=80=99s no visibility on how = the settings affect how often vacuum decides to be aggressive. At = minimum, we should have stats on when vacuum is aggressive, especially = since it significantly changes the behavior of autovac.

I saw someone else already mentioned tuning vacuum memory = usage, but I=E2=80=99ll mention it again. Even if the issues = with index_vacuum_count are fixed that still only tells you if you = have a problem; it doesn=E2=80=99t give you a great idea of how much = more memory you need. The best you can do is assuming you need (number = of passes - 1) * current memory.

Speaking of which=E2=80= =A6 there should be stats on any time vacuum decided on it=E2=80=99s own = to skip index processing due to wraparound proximity.

I=E2=80=99m sure there=E2=80=99s some other use cases that = I=E2=80=99m not thinking of.
= --Apple-Mail=_8359B6D1-0916-4124-8EEC-65C9700F8F01--