public inbox for [email protected]
help / color / mirror / Atom feedFrom: Sami Imseih <[email protected]>
To: Daniil Davydov <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: Matheus Alcantara <[email protected]>
Cc: Maxim Orlov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: Thu, 22 May 2025 12:48:07 -0500
Message-ID: <CAA5RZ0twipMFOv0ag9Hx4z1APoo5mRu7T1t+OebAMtJmhttaig@mail.gmail.com> (raw)
In-Reply-To: <CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
References: <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
<CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
<CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
<CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
<CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
<CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
<CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
<CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
<CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
<CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
<CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
<CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
<CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
<CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
I started looking at the patch but I have some high level thoughts I would
like to share before looking further.
> > I find that the name "autovacuum_reserved_workers_num" is generic. It
> > would be better to have a more specific name for parallel vacuum such
> > as autovacuum_max_parallel_workers. This parameter is related to
> > neither autovacuum_worker_slots nor autovacuum_max_workers, which
> > seems fine to me. Also, max_parallel_maintenance_workers doesn't
> > affect this parameter.
> > .......
> > I've also considered some alternative names. If we were to use
> > parallel_maintenance_workers, it sounds like it controls the parallel
> > degree for all operations using max_parallel_maintenance_workers,
> > including CREATE INDEX. Similarly, vacuum_parallel_workers could be
> > interpreted as affecting both autovacuum and manual VACUUM commands,
> > suggesting that when users run "VACUUM (PARALLEL) t", the system would
> > use their specified value for the parallel degree. I prefer
> > autovacuum_parallel_workers or vacuum_parallel_workers.
> >
>
> This was my headache when I created names for variables. Autovacuum
> initially implies parallelism, because we have several parallel a/v
> workers. So I think that parameter like
> `autovacuum_max_parallel_workers` will confuse somebody.
> If we want to have a more specific name, I would prefer
> `max_parallel_index_autovacuum_workers`.
I don't think we should have a separate pool of parallel workers for those
that are used to support parallel autovacuum. At the end of the day, these
are parallel workers and they should be capped by max_parallel_workers. I think
it will be confusing if we claim these are parallel workers, but they
are coming from
a different pool.
I envision we have another GUC such as "max_parallel_autovacuum_workers"
(which I think is a better name) that matches the behavior of
"max_parallel_maintenance_worker". Meaning that the autovacuum workers
still maintain their existing behavior ( launching a worker per table
), and if they do need
to vacuum in parallel, they can draw from a pool of parallel workers.
With the above said, I therefore think the reloption should actually be a number
of parallel workers rather than a boolean. Let's take an example of a
user that has 3 tables
they wish to (auto)vacuum can process in parallel, and if available
they wish each of these tables
could be autovacuumed with 4 parallel workers. However, as to not
overload the system, they
cap the 'max_parallel_maintenance_worker' to something like 8. If it
so happens that all
3 tables are auto-vacuumed at the same time, there may not be enough
parallel workers,
so one table will be a loser and be vacuumed in serial. That is
acceptable, and a/v logging
( and perhaps other stat views ) should display this behavior: workers
planned vs workers launched.
thoughts?
--
Sami Imseih
Amazon Web Services (AWS)
view thread (112+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: POC: Parallel processing of indexes in autovacuum
In-Reply-To: <CAA5RZ0twipMFOv0ag9Hx4z1APoo5mRu7T1t+OebAMtJmhttaig@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox