MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
 <CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
 <CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
 <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
 <CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
 <CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
 <CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
 <CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
 <CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
 <CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
 <CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
 <CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
 <CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
In-Reply-To: 
 <CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 17 Jun 2025 15:36:41 -0700
Message-ID: 
 <CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: Daniil Davydov <3danissimo@gmail.com>
Cc: Matheus Alcantara <matheusssilv97@gmail.com>,
 Sami Imseih <samimseih@gmail.com>,
	Maxim Orlov <orlovmg@gmail.com>,
 Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAD21AoAPnq0vrcGgeN%2B%2Br1GoL8Kza7jaGL%3DTNzuBn6%2BMkR%3DrUQ%40mail.gmail.com>
Precedence: bulk

On Sun, May 25, 2025 at 10:22=E2=80=AFAM Daniil Davydov <3danissimo@gmail.c=
om> wrote:
>
> Hi,
>
> On Fri, May 23, 2025 at 6:12=E2=80=AFAM Masahiko Sawada <sawada.mshk@gmai=
l.com> wrote:
> >
> > On Thu, May 22, 2025 at 12:44=E2=80=AFAM Daniil Davydov <3danissimo@gma=
il.com> wrote:
> > >
> > > On Wed, May 21, 2025 at 5:30=E2=80=AFAM Masahiko Sawada <sawada.mshk@=
gmail.com> wrote:
> > > >
> > > > I find that the name "autovacuum_reserved_workers_num" is generic. =
It
> > > > would be better to have a more specific name for parallel vacuum su=
ch
> > > > as autovacuum_max_parallel_workers. This parameter is related to
> > > > neither autovacuum_worker_slots nor autovacuum_max_workers, which
> > > > seems fine to me. Also, max_parallel_maintenance_workers doesn't
> > > > affect this parameter.
> > >
> > > This was my headache when I created names for variables. Autovacuum
> > > initially implies parallelism, because we have several parallel a/v
> > > workers.
> >
> > I'm not sure if it's parallelism. We can have multiple autovacuum
> > workers simultaneously working on different tables, which seems not
> > parallelism to me.
>
> Hm, I didn't thought about the 'parallelism' definition in this way.
> But I see your point - the next v4 patch will contain the naming that
> you suggest.
>
> >
> > > So I think that parameter like
> > > `autovacuum_max_parallel_workers` will confuse somebody.
> > > If we want to have a more specific name, I would prefer
> > > `max_parallel_index_autovacuum_workers`.
> >
> > It's better not to use 'index' as we're trying to extend parallel
> > vacuum to heap scanning/vacuuming as well[1].
>
> OK, I'll fix it.
>
> > > > +   /*
> > > > +    * If we are running autovacuum - decide whether we need to pro=
cess indexes
> > > > +    * of table with given oid in parallel.
> > > > +    */
> > > > +   if (AmAutoVacuumWorkerProcess() &&
> > > > +       params->index_cleanup !=3D VACOPTVALUE_DISABLED &&
> > > > +       RelationAllowsParallelIdxAutovac(rel))
> > > >
> > > > I think that this should be done in autovacuum code.
> > >
> > > We need params->index cleanup variable to decide whether we need to
> > > use parallel index a/v. In autovacuum.c we have this code :
> > > ***
> > > /*
> > >  * index_cleanup and truncate are unspecified at first in autovacuum.
> > >  * They will be filled in with usable values using their reloptions
> > >  * (or reloption defaults) later.
> > >  */
> > > tab->at_params.index_cleanup =3D VACOPTVALUE_UNSPECIFIED;
> > > tab->at_params.truncate =3D VACOPTVALUE_UNSPECIFIED;
> > > ***
> > > This variable is filled in inside the `vacuum_rel` function, so I
> > > think we should keep the above logic in vacuum.c.
> >
> > I guess that we can specify the parallel degree even if index_cleanup
> > is still UNSPECIFIED. vacuum_rel() would then decide whether to use
> > index vacuuming and vacuumlazy.c would decide whether to use parallel
> > vacuum based on the specified parallel degree and index_cleanup value.
> >
> > >
> > > > +#define AV_PARALLEL_DEADTUP_THRESHOLD  1024
> > > >
> > > > These fixed values really useful in common cases? I think we alread=
y
> > > > have an optimization where we skip vacuum indexes if the table has
> > > > fewer dead tuples (see BYPASS_THRESHOLD_PAGES).
> > >
> > > When we allocate dead items (and optionally init parallel autocuum) w=
e
> > > don't have sane value for `vacrel->lpdead_item_pages` (which should b=
e
> > > compared with BYPASS_THRESHOLD_PAGES).
> > > The only criterion that we can focus on is the number of dead tuples
> > > indicated in the PgStat_StatTabEntry.
> >
> > My point is that this criterion might not be useful. We have the
> > bypass optimization for index vacuuming and having many dead tuples
> > doesn't necessarily mean index vacuuming taking a long time. For
> > example, even if the table has a few dead tuples, index vacuuming
> > could take a very long time and parallel index vacuuming would help
> > the situation, if the table is very large and has many indexes.
>
> That sounds reasonable. I'll fix it.
>
> > > But autovacuum (as I think) should work as stable as possible and
> > > `unnoticed` by other processes. Thus, we must :
> > > 1) Compute resources (such as the number of parallel workers for a
> > > single table's indexes vacuuming) as efficiently as possible.
> > > 2) Provide a guarantee that as many tables as possible (among
> > > requested) will be processed in parallel.
> > >
> > > (1) can be achieved by calculating the parameters on the fly.
> > > NUM_INDEXES_PER_PARALLEL_WORKER is a rough mock. I can provide more
> > > accurate value in the near future.
> >
> > I think it requires more things than the number of indexes on the
> > table to achieve (1). Suppose that there is a very large table that
> > gets updates heavily and has a few indexes. If users want to avoid the
> > table from being bloated, it would be a reasonable idea to use
> > parallel vacuum during autovacuum and it would not be a good idea to
> > disallow using parallel vacuum solely because it doesn't have more
> > than 30 indexes. On the other hand, if the table had got many updates
> > but not so now, users might want to use resources for autovacuums on
> > other tables. We might need to consider autovacuum frequencies per
> > table, the statistics of the previous autovacuum, or system loads etc.
> > So I think that in order to achieve (1) we might need more statistics
> > and using only NUM_INDEXES_PER_PARALLEL_WORKER would not work fine.
> >
>
> It's hard for me to imagine exactly how extended statistics will help
> us track such situations.
> It seems that for any of our heuristics, it will be possible to come
> up with a counter example.
> Maybe we can give advices (via logs) to the user? But for such an
> idea, tests should be conducted so that we can understand when
> resource consumption becomes ineffective.
> I guess that we need to agree on an implementation before conducting such=
 tests.
>
> > > (2) can be achieved by workers reserving - we know that N workers
> > > (from bgworkers pool) are *always* at our disposal. And when we use
> > > such workers we are not dependent on other operations in the cluster
> > > and we don't interfere with other operations by taking resources away
> > > from them.
> >
> > Reserving some bgworkers for autovacuum could make sense. But I think
> > it's better to implement it in a general way as it could be useful in
> > other use cases too. That is, it might be a good to implement
> > infrastructure so that any PostgreSQL code (possibly including
> > extensions) can request allocating a pool of bgworkers for specific
> > usage and use bgworkers from them.
>
> Reserving infrastructure is an ambitious idea. I am not sure that we
> should implement it within this thread and feature.
> Maybe we should create a separate thread for it and as a
> justification, refer to parallel autovacuum?
>
> -----
> Thanks everybody for feedback! I attach a v4 patch to this letter.
> Main features :
> 1) 'parallel_autovacuum_workers' reloption - integer value, that sets
> the maximum number of parallel a/v workers that can be taken from
> bgworkers pool in order to process this table.
> 2) 'max_parallel_autovacuum_workers' - GUC variable, that sets the
> maximum total number of parallel a/v workers, that can be taken from
> bgworkers pool.
> 3) Parallel autovacuum does not try to use thresholds like
 > NUM_INDEXES_PER_PARALLEL_WORKER and AV_PARALLEL_DEADTUP_THRESHOLD.
> 4) Parallel autovacuum now can report statistics like "planned vs. launch=
ed".
> 5) For now I got rid of the 'reserving' idea, so now autovacuum
> leaders are competing with everyone for parallel workers from the
> bgworkers pool.
>
> What do you think about this implementation?
>

I think it basically makes sense to me. A few comments:

---
The patch implements max_parallel_autovacuum_workers as a
PGC_POSTMASTER parameter but can we make it PGC_SIGHUP? I think we
don't necessarily need to make it a PGC_POSTMATER since it actually
doesn't affect how much shared memory we need to allocate.

---
I think it's better to have the prefix "autovacuum" for the new GUC
parameter for better consistency with other autovacuum-related GUC
parameters.

---
 #include "storage/spin.h"
@@ -514,6 +515,11 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
    {
        WaitForParallelWorkersToFinish(pcxt);
        WaitForParallelWorkersToExit(pcxt);
+
+       /* Release all launched (i.e. reserved) parallel autovacuum workers=
. */
+       if (AmAutoVacuumWorkerProcess())
+           ParallelAutoVacuumReleaseWorkers(pcxt->nworkers_launched);
+
        pcxt->nworkers_launched =3D 0;
        if (pcxt->known_attached_workers)
        {
@@ -1002,6 +1008,11 @@ DestroyParallelContext(ParallelContext *pcxt)
     */
    HOLD_INTERRUPTS();
    WaitForParallelWorkersToExit(pcxt);
+
+   /* Release all launched (i.e. reserved) parallel autovacuum workers. */
+   if (AmAutoVacuumWorkerProcess())
+       ParallelAutoVacuumReleaseWorkers(pcxt->nworkers_launched);
+
    RESUME_INTERRUPTS();

I think that it's better to release workers in vacuumparallel.c rather
than parallel.c.

Regards,

--=20
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com