MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
 <CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
 <CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
In-Reply-To: 
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
From: Sami Imseih <samimseih@gmail.com>
Date: Mon, 5 May 2025 19:21:07 -0500
Message-ID: 
 <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: Daniil Davydov <3danissimo@gmail.com>, Maxim Orlov <orlovmg@gmail.com>,
	Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000639b0306346c98e2"
Archived-At: 
 <https://www.postgresql.org/message-id/CAA5RZ0vfBg%3Dc_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag%40mail.gmail.com>
Precedence: bulk

--000000000000639b0306346c98e2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

> On Sat, May 3, 2025 at 1:10=E2=80=AFAM Daniil Davydov <3danissimo@gmail.c=
om>
> wrote:
> >
> > On Sat, May 3, 2025 at 5:28=E2=80=AFAM Masahiko Sawada <sawada.mshk@gma=
il.com>
> wrote:
> > >
> > > > In current implementation, the leader process sends a signal to the
> > > > a/v launcher, and the launcher tries to launch all requested worker=
s.
> > > > But the number of workers never exceeds `autovacuum_max_workers`.
> > > > Thus, we will never have more a/v workers than in the standard case
> > > > (without this feature).
> > >
> > > I have concerns about this design. When autovacuuming on a single
> > > table consumes all available autovacuum_max_workers slots with
> > > parallel vacuum workers, the system becomes incapable of processing
> > > other tables. This means that when determining the appropriate
> > > autovacuum_max_workers value, users must consider not only the number
> > > of tables to be processed concurrently but also the potential number
> > > of parallel workers that might be launched. I think it would more mak=
e
> > > sense to maintain the existing autovacuum_max_workers parameter while
> > > introducing a new parameter that would either control the maximum
> > > number of parallel vacuum workers per autovacuum worker or set a
> > > system-wide cap on the total number of parallel vacuum workers.
> > >
> >
> > For now we have max_parallel_index_autovac_workers - this GUC limits
> > the number of parallel a/v workers that can process a single table. I
> > agree that the scenario you provided is problematic.
> > The proposal to limit the total number of supportive a/v workers seems
> > attractive to me (I'll implement it as an experiment).
> >
> > It seems to me that this question is becoming a key one. First we need
> > to determine the role of the user in the whole scheduling mechanism.
> > Should we allow users to determine priority? Will this priority affect
> > only within a single vacuuming cycle, or it will be more 'global'?
> > I guess I don't have enough expertise to determine this alone. I will
> > be glad to receive any suggestions.
>
> What I roughly imagined is that we don't need to change the entire
> autovacuum scheduling, but would like autovacuum workers to decides
> whether or not to use parallel vacuum during its vacuum operation
> based on GUC parameters (having a global effect) or storage parameters
> (having an effect on the particular table). The criteria of triggering
> parallel vacuum in autovacuum might need to be somewhat pessimistic so
> that we don't unnecessarily use parallel vacuum on many tables.


Perhaps we should only provide a reloption, therefore only tables specified
by the user via the reloption can be autovacuumed  in parallel?

This gives a targeted approach. Of course if multiple of these allowed
tables
are to be autovacuumed at the same time, some may not get all the workers,
But that=E2=80=99s not different from if you are to manually vacuum in para=
llel the
tables
at the same time.

What do you think ?

=E2=80=94
Sami

>

--000000000000639b0306346c98e2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div><br></div><div><div class=3D"gmail_quote gmail_quote_container"><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex" dir=3D"auto">On Sat, May 3, 2025 at 1:10=E2=80=AFA=
M Daniil Davydov &lt;<a href=3D"mailto:3danissimo@gmail.com" target=3D"_bla=
nk">3danissimo@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; On Sat, May 3, 2025 at 5:28=E2=80=AFAM Masahiko Sawada &lt;<a href=3D"=
mailto:sawada.mshk@gmail.com" target=3D"_blank">sawada.mshk@gmail.com</a>&g=
t; wrote:<br>
&gt; &gt;<br>
&gt; &gt; &gt; In current implementation, the leader process sends a signal=
 to the<br>
&gt; &gt; &gt; a/v launcher, and the launcher tries to launch all requested=
 workers.<br>
&gt; &gt; &gt; But the number of workers never exceeds `autovacuum_max_work=
ers`.<br>
&gt; &gt; &gt; Thus, we will never have more a/v workers than in the standa=
rd case<br>
&gt; &gt; &gt; (without this feature).<br>
&gt; &gt;<br>
&gt; &gt; I have concerns about this design. When autovacuuming on a single=
<br>
&gt; &gt; table consumes all available autovacuum_max_workers slots with<br=
>
&gt; &gt; parallel vacuum workers, the system becomes incapable of processi=
ng<br>
&gt; &gt; other tables. This means that when determining the appropriate<br=
>
&gt; &gt; autovacuum_max_workers value, users must consider not only the nu=
mber<br>
&gt; &gt; of tables to be processed concurrently but also the potential num=
ber<br>
&gt; &gt; of parallel workers that might be launched. I think it would more=
 make<br>
&gt; &gt; sense to maintain the existing autovacuum_max_workers parameter w=
hile<br>
&gt; &gt; introducing a new parameter that would either control the maximum=
<br>
&gt; &gt; number of parallel vacuum workers per autovacuum worker or set a<=
br>
&gt; &gt; system-wide cap on the total number of parallel vacuum workers.<b=
r>
&gt; &gt;<br>
&gt;<br>
&gt; For now we have max_parallel_index_autovac_workers - this GUC limits<b=
r>
&gt; the number of parallel a/v workers that can process a single table. I<=
br>
&gt; agree that the scenario you provided is problematic.<br>
&gt; The proposal to limit the total number of supportive a/v workers seems=
<br>
&gt; attractive to me (I&#39;ll implement it as an experiment).<br>
&gt;<br>
&gt; It seems to me that this question is becoming a key one. First we need=
<br>
&gt; to determine the role of the user in the whole scheduling mechanism.<b=
r>
&gt; Should we allow users to determine priority? Will this priority affect=
<br>
&gt; only within a single vacuuming cycle, or it will be more &#39;global&#=
39;?<br>
&gt; I guess I don&#39;t have enough expertise to determine this alone. I w=
ill<br>
&gt; be glad to receive any suggestions.<br>
<br>
What I roughly imagined is that we don&#39;t need to change the entire<br>
autovacuum scheduling, but would like autovacuum workers to decides<br>
whether or not to use parallel vacuum during its vacuum operation<br>
based on GUC parameters (having a global effect) or storage parameters<br>
(having an effect on the particular table). The criteria of triggering<br>
parallel vacuum in autovacuum might need to be somewhat pessimistic so<br>
that we don&#39;t unnecessarily use parallel vacuum on many tables.</blockq=
uote><div dir=3D"auto"><br></div><div dir=3D"auto">Perhaps we should only p=
rovide a reloption, therefore only tables specified=C2=A0</div><div dir=3D"=
auto">by the user via the reloption can be autovacuumed =C2=A0in parallel?=
=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto">This gives a targ=
eted approach. Of course if multiple of these allowed tables=C2=A0</div><di=
v dir=3D"auto">are to be autovacuumed at the same time, some may not get al=
l the workers,</div><div dir=3D"auto">But that=E2=80=99s not different from=
 if you are to manually vacuum in parallel the tables=C2=A0</div><div dir=
=3D"auto">at the same time.=C2=A0</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">What do you think ?=C2=A0</div><div dir=3D"auto"><br></div><div d=
ir=3D"auto">=E2=80=94</div><div dir=3D"auto">Sami=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex" dir=3D"auto"></blockquote></div></div>

--000000000000639b0306346c98e2--