Message-ID: <6732acf8ce0f31025b535ae1a64568750924a887.camel@moonset.ru>
Subject: Re: Vacuum statistics
From: Andrei Zubkov <zubkov@moonset.ru>
To: Jim Nasby <jnasby@upgrade.com>, Alexander Korotkov <aekorotkov@gmail.com>
Cc: Alena Rybakina <a.rybakina@postgrespro.ru>, Masahiko Sawada
 <sawada.mshk@gmail.com>, Melanie Plageman <melanieplageman@gmail.com>, jian
 he <jian.universality@gmail.com>, pgsql-hackers
 <pgsql-hackers@postgresql.org>,  a.lepikhov@postgrespro.ru
Date: Tue, 29 Oct 2024 15:40:09 +0300
In-Reply-To: <0B6CBF4C-CC2A-4200-9126-CE3A390D938B@upgrade.com>
References: <bee34abe-fe87-49b3-8523-8ae19e641ccf@yandex.ru>
	 <53c47c2d-72a5-44f2-900c-9973b2af1808@tantorlabs.com>
	 <c82e7776-7841-42a0-9247-878584174750@tantorlabs.com>
	 <4a902cea-54fb-41b5-b208-b84731a5f577@postgrespro.ru>
	 <CACJufxHb_YGCp=pVH6DZcpk9yML+SueffPeaRbX2LzXZVahd_w@mail.gmail.com>
	 <CACJufxE6yAP+jUm4_GyKp7gUCkzbuJ9HGB-rR=92_hcLLZ9KTg@mail.gmail.com>
	 <092adec6-4eae-4bd4-bd0d-473a9df1282b@tantorlabs.com>
	 <3deae1bd-ad84-4459-a26e-04c9136b84e9@postgrespro.ru>
	 <CAPpHfdug0s2MD7bBf-5nDQGn1WBxCKiTmZyGfxHz_7P0CDOjbg@mail.gmail.com>
	 <c4e4e305-7119-4183-b49a-d7092f4efba3@postgrespro.ru>
	 <f05a9031-f923-41b1-9bf9-81204cb56561@postgrespro.ru>
	 <CACJufxFU4ej3iGtGg3GDqmGzRmTVq3d9RGq+ibLfQoS8E3hJEQ@mail.gmail.com>
	 <9b10c6d3-52c4-4eef-b67c-c33442667729@postgrespro.ru>
	 <CAD21AoAVK7DwTZLfhwuRhTGgR=_ASu5YshEg_Cmpojk5ZdZ3tA@mail.gmail.com>
	 <CAAKRu_auu=xt4w3Mm_jW-voJunZgno6XDDohH6hhRc4Z9dGdYQ@mail.gmail.com>
	 <CAD21AoD66b3u28n=73kudgMp5wiGiyYUN9LuC9z2ka6YTru5Gw@mail.gmail.com>
	 <9485d892-fd04-4e3a-ac24-7dd767cb7333@postgrespro.ru>
	 <CAPpHfduwY8-fp34CuO9O57ouCs1K=Gn1rTnuG4AaWYhEo6nXyw@mail.gmail.com>
	 <0B6CBF4C-CC2A-4200-9126-CE3A390D938B@upgrade.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.46.4-2 
MIME-Version: 1.0
Archived-At: <https://www.postgresql.org/message-id/6732acf8ce0f31025b535ae1a64568750924a887.camel%40moonset.ru>
Precedence: bulk

Hi,

Thanks for your attention to our patch!

On Mon, 2024-10-28 at 16:03 -0500, Jim Nasby wrote:
> > Yes, but as Masahiko-san pointed out, PgStat_TableCounts is almost
> > tripled in space. =C2=A0That a huge change from having no statistics on
> > vacuum to have it in much more detail than everything else we
> > currently have. =C2=A0I think the feasible way might be to introduce
> > some
> > most demanded statistics first then see how it goes.
>=20
> Looking at the stats I do think the WAL stats are probably not
> helpful. First, there=E2=80=99s nothing users can do to tune how much WAL=
 is
> generated by vacuum. Second, this introduces the risk of users saying
> =E2=80=9CWow, vacuum is creating a lot of WAL! I=E2=80=99m going to turn =
it down!=E2=80=9D,
> which is most likely to make matters worse. There=E2=80=99s already a lot=
 of
> stuff that goes into WAL without any detailed logging; if we ever
> wanted to provide a comprehensive view of what data is in WAL that
> should be handled separately.

Yes, there is nothing we can directly do with WAL generated by vacuum,
but WAL generation is the part of vacuum work, and it will indirectly
affected by the changes of vacuum settings. So, WAL statistics is one
more dimension of vacuum workload. Also WAL stat is universal metric
which is measured cluster-wide and on the statement-level with
pg_stat_statements. Vacuum WAL counters will explain the part of
difference between those metrics. Besides vacuum WAL counters can be
used to locate abnormal vacuum behavior caused by a bug or the data
corruption. I think if the DBA is smart enough to look at vacuum WAL
generated stats and to understand what it means, the decision to
disable the autovacuum due to its WAL generation is unlikely.

Anyway I think some stats can be excluded to save some memory. The
first candidates are the system_time and user_time fields. Those are
very valuable, but are measured by the rusage stats, which won't be
available on all platforms. I think total_time and delay_time would be
sufficient.
The second is the interrupts field. It is needed for monitoring to know
do we have them or not, so tracking them on the database level will do
the trick. Interrupt is quite rare event, so once the monitoring system
will catch one the DBA can go to the server log for the details.

It seems there is another way. If the vacuum stats doesn't seems to be
mandatory in all systems, maybe we should add some hooks to the vacuum
so that vacuum statistics tracking can be done in an extension. I don't
think it is a good idea, because vacuum stats seems to me as mandatory
as the vacuum process itself.

> Is there a reason some fields are omitted
> from=C2=A0pg_stat_vacuum_database? While some stats are certainly more
> interesting at the per-relation level, I can=E2=80=99t really think of an=
y
> that don=E2=80=99t make sense at the database level as well.

Some of the metrics are table-specific, some index-specific, so we
moved to the database level metrics more or less specific to the whole
database. Can you tell what stats you want to see at the database
level?

> Looking at the per table/index stats, I strongly dislike the use of
> the term =E2=80=9Cdelete=E2=80=9D - it is a recipe for confusion with row=
 deletion..
> A much better term is =E2=80=9Cremove=E2=80=9D or =E2=80=9Cremoved=E2=80=
=9D. I realize the term
> =E2=80=9Cdelete=E2=80=9D is used in places in vacuum logging, but IMO we =
should fix
> that as well instead of doubling-down on it.

Yes, this point was discussed in our team, and it seems confusing to me
too. We decided to name it as it is named in the code and to get
feedback from the community. Now we get one. Thank you. Now we should
discuss it and choose the best one. My personal choice is "removed".


> I realize =E2=80=9Crelname=E2=80=9D is being used for consistency with
> pg_stat_all_(tables|indexes), but I=E2=80=99m not sure it makes sense to
> double-down on that. Especially in pg_stat_vacuum_indexes, where it=E2=80=
=99s
> not completely clear whether relname is referring to the table or the
> index. I=E2=80=99m also inclined to say that the name of the table should=
 be
> included in pg_stat_vacuum_indexes.

Agreed. Table name is needed in the index view.


> For all the views the docs should clarify that total_blks_written
> means blocks written by vacuum, as opposed to the background Ywriter.

We have the "Number of database blocks written by vacuum operations
performed on this table" in the docs now. Do you mean we should
specifically note the vacuum process here?

> Similarly they should clarify the difference between
> rel_blks_(read|hit) and total_blks_(read|hit). In the case of
> pg_stat_vacuum_indexes it=E2=80=99d be better if rel_blks_(read|hit) were
> called index_blks_(read|hit). Although=E2=80=A6 if total_blks_* is actual=
ly
> the count across the table and all the indexes I don=E2=80=99t know that =
we
> even need that counter. I realize that not ever vacuum even looks at
> the indexes, but if we=E2=80=99re going to go into that level of detail t=
hen
> we would (at minimum) need to count the number of times a vacuum
> completely skipped scanning the indexes.

It is not clear to me enough. The stats described just as it is -
rel_blocks_* tracks blocks of the current heap, and total_* is for the
whole database blocks - not just tables and indexes, vacuum do some
work (quite a little) in the catalog and this work is counted here too.
Usually this stat won't be helpful, but maybe we can catch unusual
vacuum behavior using this stat.


> Having rev_all_(frozen|visible)_pages in the same view as vacuum
> stats will confuse users into thinking that vacuum is clearing the
> bits. Those fields really belong in pg_stat_all_tables.

Agreed.

> Sadly=C2=A0index_vacuum_count is may not useful at all at present. At
> minimum you=E2=80=99d need to know the number of times vacuum had run in
> total. I realize that=E2=80=99s in pg_stat_all_tables, but that doesn=E2=
=80=99t help
> if vacuum stats are tracked or reset separately.

I'm in doubt - is it really possible to reset the vacuum stats
independent of pg_stat_all_tables?

> At minimum the docs should mention them. They also need to clarify
> if=C2=A0index_vacuum_count is incremented per-index or per-pass (hopefull=
y
> the later). Assuming it=E2=80=99s per-pass, a better name for the field w=
ould
> be index_vacuum_passes, index_passes, index_pass_count, or similar.
> But even with that we still need a counter for the number of vacuums
> where index processing was skipped.

Agreed, the "index_passes" looks good to me, and index processing skip
counter looks good.


> First, there=E2=80=99s still gaps in trying to track HOT; most notably a
> counter for how many updates would never be HOT eligible because they
> modify indexes. pg_stat_all_tables.n_tup_newpage_upd is really
> limited without that info.

Nice catch, I'll think about it. Those are not directly connected to
the vacuum workload but those are important.

> There should also be stats about unused line pointers - in degenerate
> cases the lp array can consume a significant portion of heap storage.
>=20
> Monitoring bloat would be a lot more accurate if vacuum reported
> total tuple length for each run along with the total number of tuples
> it looked at. Having that info would make it trivial to calculate
> average tuple size, which could then be applied to reltuples and
> relpages to calculate how much space would being lost to bloat.

Yes, bloat tracking is in our plans. Right now it is not clear enough
how to do it in the most reliable and convenient way.

> Autovacuum will self-terminate if it would block another process
> (unless it=E2=80=99s an aggressive vacuum) - that=E2=80=99s definitely so=
mething that
> should be tracked. Not just the number of times that happens, but
> also stats about how much work was lost because of this.

Agreed.

> Shrinking a relation (what vacuum calls truncation, which is very
> confusing with the truncate command) is a rather complex process that
> currently has no visibility.

In this patch table truncation can be seen in the "pages_removed" field
of "pg_stat_vacuum_tables" at least as the cumulative number of removed
pages. It is not clear enough, but it is visible.


> Tuning=C2=A0vacuum_freeze_min_age (and the MXID variant) is rather
> complicated. We maybe have enough stats on whether it could be set
> lower, but there=E2=80=99s no visibility on how the settings affect how o=
ften
> vacuum decides to be aggressive. At minimum, we should have stats on
> when vacuum is aggressive, especially since it significantly changes
> the behavior of autovac.

When you say "agressive" do you mean the number of times when the
vacuum was processing the table with the FREEZE intention? I think this
is needed too.

> I saw someone else already mentioned tuning vacuum memory usage, but
> I=E2=80=99ll mention it again. Even if the issues with=C2=A0index_vacuum_=
count are
> fixed that still only tells you if you have a problem; it doesn=E2=80=99t
> give you a great idea of how much more memory you need. The best you
> can do is assuming you need (number of passes - 1) * current memory.

Do you think such approach is insufficient? It seems we do not need
byte-to-byte accuracy here.

> Speaking of which=E2=80=A6 there should be stats on any time vacuum decid=
ed
> on it=E2=80=99s own to skip index processing due to wraparound proximity.

Maybe we should just count the number of times when the vacuum was
started to prevent wraparound?

Jim, thank you for such detailed review of our patch!

--=20
regards, Andrei Zubkov
Postgres Professional