public inbox for [email protected]  
help / color / mirror / Atom feed
From: Daniil Davydov <[email protected]>
To: Masahiko Sawada <[email protected]>
Cc: Maxim Orlov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: Tue, 6 May 2025 11:54:49 +0700
Message-ID: <CAJDiXghJNVXXBUKOVciT=GY4udJeRh4kbEJ2GY7v8aULO9w84w@mail.gmail.com> (raw)
In-Reply-To: <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
References: <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
	<CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
	<CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
	<CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
	<CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
	<CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>

On Tue, May 6, 2025 at 6:57 AM Masahiko Sawada <[email protected]> wrote:
>
> What I roughly imagined is that we don't need to change the entire
> autovacuum scheduling, but would like autovacuum workers to decides
> whether or not to use parallel vacuum during its vacuum operation
> based on GUC parameters (having a global effect) or storage parameters
> (having an effect on the particular table). The criteria of triggering
> parallel vacuum in autovacuum might need to be somewhat pessimistic so
> that we don't unnecessarily use parallel vacuum on many tables.
>

+1, I think about it in the same way. I will expand on this topic in
more detail in response to Sami's letter [1], so as not to repeat
myself.

> > Here are my thoughts on this. A/v worker has a very simple role - it
> > is born after the launcher's request and must do exactly one 'task' -
> > vacuum table or participate in parallel index vacuum.
> > We also have a dedicated 'launcher' role, meaning the whole design
> > implies that only the launcher is able to launch processes.
> >
> > If we allow a/v worker to use bgworkers, then :
> > 1) A/v worker will go far beyond his responsibility.
> > 2) Its functionality will overlap with the functionality of the launcher.
>
> While I agree that the launcher process is responsible for launching
> autovacuum worker processes but I'm not sure it should be for
> launching everything related autovacuums. It's quite possible that we
> have parallel heap vacuum and processing the particular index with
> parallel workers in the future. The code could get more complex if we
> have the autovacuum launcher process launch such parallel workers too.
> I believe it's more straightforward to divide the responsibility like
> in a way that the autovacuum launcher is responsible for launching
> autovacuum workers and autovacuum workers are responsible for
> vacuuming tables no matter how to do that.

It sounds very tempting. At the very beginning I did exactly that (to
make sure that nothing would break in a parallel autovacuum). Only
later it was decided to abandon the use of bgworkers.
For now both approaches look fair for me. What do you think - will
others agree that we can provide more responsibility to a/v workers?

> > 3) Resource consumption can jump dramatically, which is unexpected for
> > the user.
>
> What extra resources could be used if we use background workers
> instead of autovacuum workers?

I meant that more processes are starting to participate in the
autovacuum than indicated in autovacuum_max_workers. And if a/v worker
will use additional bgworkers => other operations cannot get these
resources.

> > Autovacuum will also be dependent on other resources
> > (bgworkers pool). The current design does not imply this.
>
> I see your point but I think it doesn't necessarily need to reflect it
> at the infrastructure layer. For example, we can internally allocate
> extra background worker slots for parallel vacuum workers based on
> max_parallel_index_autovac_workers in addition to
> max_worker_processes. Anyway we might need something to check or
> validate max_worker_processes value to make sure that every autovacuum
> worker can use the specified number of parallel workers for parallel
> vacuum.

I don't think that we can provide all supportive workers for each
parallel index vacuuming request. But I got your point - always keep
several bgworkers that only a/v workers can use if needed and the size
of this additional pool (depending on max_worker_processes) must be
user-configurable.

> > I wanted to create a patch that would fit into the existing mechanism
> > without drastic innovations. But if you think that the above is not so
> > important, then we can reuse VACUUM PARALLEL code and it would
> > simplify the final implementation)
>
> I'd suggest using the existing infrastructure if we can achieve the
> goal with it. If we find out there are some technical difficulties to
> implement it without new infrastructure, we can revisit this approach.

OK, in the near future I'll implement it and send a new patch to this
thread. I'll be glad if you will take a look on it)

[1] https://www.postgresql.org/message-id/CAA5RZ0vfBg%3Dc_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag%40mail.g...

--
Best regards,
Daniil Davydov





view thread (112+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: POC: Parallel processing of indexes in autovacuum
  In-Reply-To: <CAJDiXghJNVXXBUKOVciT=GY4udJeRh4kbEJ2GY7v8aULO9w84w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox