public inbox for [email protected]  
help / color / mirror / Atom feed
From: Sami Imseih <[email protected]>
To: Masahiko Sawada <[email protected]>
Cc: Daniil Davydov <[email protected]>
Cc: Maxim Orlov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: Mon, 5 May 2025 19:21:07 -0500
Message-ID: <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com> (raw)
In-Reply-To: <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
References: <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
	<CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
	<CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
	<CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
	<CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
	<CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>

> On Sat, May 3, 2025 at 1:10 AM Daniil Davydov <[email protected]>
> wrote:
> >
> > On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <[email protected]>
> wrote:
> > >
> > > > In current implementation, the leader process sends a signal to the
> > > > a/v launcher, and the launcher tries to launch all requested workers.
> > > > But the number of workers never exceeds `autovacuum_max_workers`.
> > > > Thus, we will never have more a/v workers than in the standard case
> > > > (without this feature).
> > >
> > > I have concerns about this design. When autovacuuming on a single
> > > table consumes all available autovacuum_max_workers slots with
> > > parallel vacuum workers, the system becomes incapable of processing
> > > other tables. This means that when determining the appropriate
> > > autovacuum_max_workers value, users must consider not only the number
> > > of tables to be processed concurrently but also the potential number
> > > of parallel workers that might be launched. I think it would more make
> > > sense to maintain the existing autovacuum_max_workers parameter while
> > > introducing a new parameter that would either control the maximum
> > > number of parallel vacuum workers per autovacuum worker or set a
> > > system-wide cap on the total number of parallel vacuum workers.
> > >
> >
> > For now we have max_parallel_index_autovac_workers - this GUC limits
> > the number of parallel a/v workers that can process a single table. I
> > agree that the scenario you provided is problematic.
> > The proposal to limit the total number of supportive a/v workers seems
> > attractive to me (I'll implement it as an experiment).
> >
> > It seems to me that this question is becoming a key one. First we need
> > to determine the role of the user in the whole scheduling mechanism.
> > Should we allow users to determine priority? Will this priority affect
> > only within a single vacuuming cycle, or it will be more 'global'?
> > I guess I don't have enough expertise to determine this alone. I will
> > be glad to receive any suggestions.
>
> What I roughly imagined is that we don't need to change the entire
> autovacuum scheduling, but would like autovacuum workers to decides
> whether or not to use parallel vacuum during its vacuum operation
> based on GUC parameters (having a global effect) or storage parameters
> (having an effect on the particular table). The criteria of triggering
> parallel vacuum in autovacuum might need to be somewhat pessimistic so
> that we don't unnecessarily use parallel vacuum on many tables.


Perhaps we should only provide a reloption, therefore only tables specified
by the user via the reloption can be autovacuumed  in parallel?

This gives a targeted approach. Of course if multiple of these allowed
tables
are to be autovacuumed at the same time, some may not get all the workers,
But that’s not different from if you are to manually vacuum in parallel the
tables
at the same time.

What do you think ?

—
Sami

>


view thread (112+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: POC: Parallel processing of indexes in autovacuum
  In-Reply-To: <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox