MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
 <CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
 <CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
 <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
 <CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
 <CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
 <CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
 <CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
 <CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
 <CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
 <CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
 <CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
 <CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
 <CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
 <CAJDiXghmsbTmnm--9B5bbuZXa1OL7SZ0HYppX3tx9XsdwfJBhA@mail.gmail.com>
 <DB3C67FCRLOO.1R5NLYCNEA6BF@gmail.com>
 <CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com>
 <CAD21AoDtPpkkQ_h1yf4oTx1qn4SRdTeVY3qs+9J07fYqa_4Gww@mail.gmail.com>
 <CAJDiXgi7KB7wSQ=Ux=ngdaCvJnJ5x-ehvTyiuZez+5uKHtV6iQ@mail.gmail.com>
 <CAD21AoCcHKKXsr9Oh736ejckqqS1i430xGEyJ=JP5OL0ExyP1A@mail.gmail.com>
 <CAJDiXghaFT_1sSv3q8mjyZ_RLZDgiogg0mWRvLxSWvkUi2CcLg@mail.gmail.com>
 <CAA5RZ0u63W41OmcEO+HLs4CSo-Sd3J+Q-4=04iud8V=xX4iUrA@mail.gmail.com>
 <CAJDiXgin1TXniVGJKzOTA=F9K342uVfm6O0EmubTVB=F+XSrbA@mail.gmail.com>
 <CAD21AoDadzAwibxf-+urjx=XL+eVu8=Ut-Lh2GxXUt32LbPG3Q@mail.gmail.com>
 <CAD21AoD6HhraqhOgkQJOrr0ixZkAZuqJRpzGv-B+_-ad6d5aPw@mail.gmail.com>
 <CAJDiXgiGSpqMQSOx-cVO_LtcB5GWHBy9ph7oOR4ebbX8A==kgw@mail.gmail.com>
 <CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com>
 <CAPpHfduBJfMcojvmYHUo8b_C=0cxRy1N+tNiNGoA3RAZq2ApaA@mail.gmail.com>
 <CAD21AoC82NeHKXc965pPUZO2eyo1U7P6cmfRJbrcPDcnd7_6hw@mail.gmail.com>
 <CAJDiXghP2kXnEz+cj3rAWNM3NdKSB_4WtnngFXpVz2omPhGr5A@mail.gmail.com>
 <CAD21AoA0bnRZC_OqKMnH-Ln+OZ9z9k56j2c_MXj8pw69O-wkBw@mail.gmail.com>
 <CAA5RZ0sSXDza7_nUUbhHL_Sws+M+HR1daKJPXHpdLuNCkwUgUg@mail.gmail.com>
 <CAJDiXggrBsbzOisf+Nu8pZkYGrpUZaFbosL1Wbm3kKxzTm4xgw@mail.gmail.com>
 <CAA5RZ0tbiPcgQEjnhdnjz6qSjfRsGrr8jGCaMcrMaoPpax3wig@mail.gmail.com>
 <CAJDiXgjt5ZmK2uvS0E8Ztt5ePYmq8Ze_dG05Zo2NUsKLHCEuYA@mail.gmail.com>
 <CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com>
 <CAJDiXghwtUbiFnAh3nSaxTk8KFupQuMbp+g4z3wOLoQfMuqgDg@mail.gmail.com>
 <CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>
 <CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com>
 <CAJDiXggL=J0nV7PfBsMW9+UOU3KUp1jNBM9Gov1JvAX7aG_U1g@mail.gmail.com>
 <CAD21AoDz-1Zf9DOJJrdcB2=eNA4UdywthkowNp_dHmOGC-yV_g@mail.gmail.com>
 <CAJDiXgjzphJ313=aDwbvryHpmTi6AqE+-5crysTtzKv01-vkzA@mail.gmail.com>
 <CAD21AoD7_4gsQ2a82zO3SaRwjdw_3tyiYDHNFPUKQ5DAA5HOtA@mail.gmail.com>
 <CAJDiXggY1QzNde6_HhpzneLc9dYqmWZ+PY39cuBXYdcCTuoJBA@mail.gmail.com>
 <CAD21AoCFPiS2jcMA1JaV1kT8xrGz5BpN7iBP_gCgRuaANEbciA@mail.gmail.com>
 <CAJDiXgh6jmNGR3uOB_6YeGhNkR2=HdTdEYjmHXdumNzyY4MckQ@mail.gmail.com>
 <CAD21AoDs3SOXeAEoCRizfEKybpRkE7t7poX0+iZ6MM1MFWMsfA@mail.gmail.com>
 <CAJDiXgjTkuqSPerC_nasxDz6d2Komf1ipYKV6SupDRnc9yhO9w@mail.gmail.com>
 <CAD21AoAXMjX03h5K84u0heBLU+fqGgWBGBDwnBDGSs=DhyF9pQ@mail.gmail.com>
 <CAJDiXghjZEAYboGhujgGvY9=RiFD01ERHVVF+NQMuuAKVZDmDQ@mail.gmail.com>
 <CAD21AoAD+N5SxBr0qL7TeWnvq4iYmFT=DyWdNLQPB-XntYkwEg@mail.gmail.com>
 <CAJDiXgjgn87sH1-MmONPKkeYJG83C0ChrYkYn9UcRonLhOOfOw@mail.gmail.com>
 <CAD21AoCoJYauWO78M-CGdHpYfcqEZVV5a1Z-7wWB=-G-x8EVFg@mail.gmail.com>
 <CAJDiXghaazbrQMZZS08d9Ffh2y4w05TgH9dpBhqChv1qNTp+xA@mail.gmail.com>
 <CAD21AoDbaNtLrFRxG9OG5WrBd7DCs4q+CfJd8AJTBEqRri4WeQ@mail.gmail.com>
 <CAJDiXgjjd1jL86B--AyRo2tDM1Wiu+7Pduwh5d0u_UM8GRugvw@mail.gmail.com>
 <CAD21AoBo_wS7y0X7_7ajEFkptzo9ZrF8RFNRnu2Xe8XL74o0SQ@mail.gmail.com>
 <CAJDiXggH1bW=4n+55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw@mail.gmail.com>
 <CAD21AoDxhN8Z6Lx1ZicBXKkbMsRQqEXiq4ALs4uaD648iSvXoA@mail.gmail.com>
 <CAJDiXgh3Dg2f5k3xRJnzoY39jQENUhh125ArYapXkSu5D7JJuw@mail.gmail.com>
 <CAD21AoBYc7L7W4dRdxeoJzOH5OgpiCAtKz-54iX4Ufn8PnQoww@mail.gmail.com>
 <CAJDiXgi8X-DMb92v5WHLCNxDHxH9gO8WQxOMtdpmU7X=WXCiuQ@mail.gmail.com>
 <CAD21AoDKxs0UrTwa3rkP+kE9AzccabpK7G-Tk=HYneaFTZBtiA@mail.gmail.com>
 <CALj2ACUJ0TtYWtFuXXVf0aLES8tfZePXnB8WQ=0KCrNaABzQVg@mail.gmail.com>
 <CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com>
In-Reply-To: 
 <CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com>
From: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Date: Sun, 29 Mar 2026 17:17:21 -0700
Message-ID: 
 <CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: Daniil Davydov <3danissimo@gmail.com>
Cc: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>,
	Masahiko Sawada <sawada.mshk@gmail.com>, Sami Imseih <samimseih@gmail.com>,
	Alexander Korotkov <aekorotkov@gmail.com>,
 Matheus Alcantara <matheusssilv97@gmail.com>,
	Maxim Orlov <orlovmg@gmail.com>,
 Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000f01fd9064e32c6e7"
Archived-At: 
 <https://www.postgresql.org/message-id/CAHg%2BQDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ%40mail.gmail.com>
Precedence: bulk

--000000000000f01fd9064e32c6e7
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi

On Sat, Mar 28, 2026 at 4:11=E2=80=AFAM Daniil Davydov <3danissimo@gmail.co=
m> wrote:

> Hi,
>
> On Thu, Mar 26, 2026 at 5:43=E2=80=AFAM Masahiko Sawada <sawada.mshk@gmai=
l.com>
> wrote:
> >
> > On Wed, Mar 25, 2026 at 12:45=E2=80=AFAM Daniil Davydov <3danissimo@gma=
il.com>
> wrote:
> > >
> > >  Searching for arguments in
> > > favor of opt-in style, I asked for help from another person who has
> been
> > > managing the setup of highload systems for decades. He promised to
> share his
> > > opinion next week.
> >
> > Given that we have one and half weeks before the feature freeze, I
> > think it's better to complete the project first before waiting for
> > his/her comments next week. Even if we finish this feature with the
> > opt-out style, we can hear more opinions on it and change the default
> > behavior as the change would be privial. What do you think?
> >
>
> Sure, if we can change the default value after the feature freeze, I don'=
t
> mind leaving our parameter in opt-out style by now.
>
> > I've squashed all patches except for the documentation patch as I
> > assume you're working on it. The attached fixup patch contains several
> > changes: using opt-out style, comment improvements, and fixing typos
> > etc.
> >
>
> Thank you very much for the proposed fixes!
> I like the way you have changed nparallel_workers calculation
> (autovacuum.c).
> Forcing parallel workers to always read shared cost params at the first
> time
> is a good decision. All comments changes are also LGTM.
>
> The only place that I have changed is reloptions.c :
> As you have explained, it is not appropriate to use the "overrides" wordi=
ng
> in the reloption's description, so I decided to return an old one.
>
> On Fri, Mar 27, 2026 at 10:54=E2=80=AFAM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 25, 2026 at 3:43=E2=80=AFPM Masahiko Sawada <sawada.mshk@gm=
ail.com>
> wrote:
> > >
> > > Given that we have one and half weeks before the feature freeze, I
> > > think it's better to complete the project first before waiting for
> > > his/her comments next week. Even if we finish this feature with the
> > > opt-out style, we can hear more opinions on it and change the default
> > > behavior as the change would be privial. What do you think?
> > >
> > > I've squashed all patches except for the documentation patch as I
> > > assume you're working on it. The attached fixup patch contains severa=
l
> > > changes: using opt-out style, comment improvements, and fixing typos
> > > etc.
> >
> > +1 for enabling this feature by default. When enough CPU is available,
> > vacuuming multiple indexes of a table in parallel in autovacuum
> > definitely speeds things up.
>
> Yes, for sure. But I have concerns that enabling parallel a/v for everyon=
e
> will cause the parallel workers shortage during processing of the most hu=
ge
> tables.
>
> > Thank you for sending the latest patches. I quickly reviewed the v31
> > patches. Here are some comments.
> >
> > 1/ +       {"autovacuum_parallel_workers", RELOPT_TYPE_INT,
> >
> > I haven't looked at the whole thread, but do we all think we need this
> > as a relopt? IMHO, we can wait for field experience and introduce this
> > later.
>
> I think that we should leave both reloption and the config parameter.
> Getting rid from the reloption will greatly reduce the ability of users t=
o
> tune this feature. I'm afraid that this may lead to people not using
> parallel
> autovacuum.
>
> > I'm having a hard time finding a use-case where one wants to
> > disable the indexes at the table level. If there was already an
> > agreement, I agree to commit to that decision.
>
> You can read discussion from [1] to the current message in order to dive
> into
> the question.
>
> To make the long story short, I think that the most common use case for
> this
> feature is allowing parallel a/v for 2-3 tables, each of which has ~100
> indexes. The rest of the tables do not require parallel processing (at
> least
> it's a much lower priority for them).
>
> At the same time, Masahiko-san thinks that only the system should decide
> which
> tables will be processed in parallel. System's decision should be based o=
n
> the
> number of indexes and a few other config parameters (e.g.
> min_parallel_index_scan_size). Thus, possibly many tables will be able to
> be
> processed in parallel.
>
> (Both opinions are pretty simplified).
>
> >
> > 2/  +   /*
> > +    * If 'true' then we are running parallel autovacuum. Otherwise, we
> are
> > +    * running parallel maintenence VACUUM.
> > +    */
> > +   bool        is_autovacuum;
> > +
> >
> > The variable name looks a bit confusing. How about we rely on
> > AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?
>
> This variable is needed for parallel workers, which are taken from the
> bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for them.
> We need the "is_autovacuum" variable in order to understand exactly what
> this
> process was started for (VACUUM PARALLEL or parallel autovacuum).
>
>
> Thanks everyone for the review!
> Please, see an updated set of patches :
> As I promised, I created a dedicated chapter for Parallel Vacuum
> description.
> Both maintenance VACUUM and autovacuum now refer to this chapter.
>
> I am pretty inexperienced in the documentation writing, so forgive me if
> something is out of code style.
>
> [1]
> https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55CGLvs_sRU4SYN=
XwYLZ37wvJ5H_3yURSPw%40mail.gmail.com


Thank you for working on this, very useful feature. Sharing a few thoughts:

1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM
resources in parallel_vacuum_compute_workers?
2. Is it intentional that other autovacuum workers not yield cost limits to
the parallel auto vacuum workers? Cost limits are distributed first equally
to the autovacuum workers.
and then they share that. Therefore, parallel workers will be heavily
throttled. IIUC, this problem doesn't exist with manual vacuum.
 If we don't fix this, at least we should document this.
3. Additionally, is there a point where, based on the cost limits,
launching additional workers becomes counterproductive compared to running
fewer workers and preventing it?
4. Would it make sense to add a table level override to disable parallelism
or set parallel worker count?


I ran some perf tests to show the improvements with parallel vacuum and
shared below.
System Configuration
--------------------
Hardware:
  CPU: 16 cores
  RAM: 128 GB
  Storage: NVMe SSDs
  OS: Ubuntu Linux

Workload Description
--------------------
Table: avtest
  - 5,000,000 rows
  - 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7
(timestamp),
    padding (text, 50 bytes)
  - 8 indexes:
      avtest_pkey  (col: id)        107 MB
      idx_av_col7  (col: col7)      107 MB
      idx_av_col2  (col: col2)       56 MB
      idx_av_col4  (col: col4)       56 MB
      idx_av_col5  (col: col5)       56 MB
      idx_av_col1  (col: col1)       56 MB
      idx_av_col3  (col: col3)       56 MB
      idx_av_col6  (col: col6)       35 MB
  - Total size: 1171 MB

Each test iteration:
  1. Delete 2,000,000 rows (40%) using: DELETE WHERE id % 5 IN (1, 2)
  2. CHECKPOINT to flush dirty pages
  3. Trigger autovacuum by setting autovacuum_vacuum_threshold =3D 100 and
     autovacuum_vacuum_scale_factor =3D 0 on the table
  4. Wait for autovacuum to complete (detected via server log)
  5. Re-insert the deleted rows and VACUUM to restore the table for the
next run


Test Methodology
----------------
Worker configurations tested: 0, 2, 4, 7 parallel workers
  (7 is the maximum: nindexes - 1, since the leader always handles one
index)

Two experiments were run with different cost-based vacuum delay settings:

  Experiment A: cost_limit=3D200, cost_delay=3D2ms
  Experiment B: cost_limit=3D60,  cost_delay=3D2ms

Common server settings for both experiments:
  shared_buffers        =3D 120 GB  (entire dataset fits in shared buffers)
  maintenance_work_mem  =3D 1 GB
  max_wal_size          =3D 100 GB  (prevents checkpoints during vacuum)
  min_wal_size          =3D 10 GB
  checkpoint_timeout    =3D 1 hour  (prevents time-based checkpoints)
  wal_buffers           =3D 128 MB
  max_parallel_workers  =3D 16
  max_worker_processes  =3D 24
  autovacuum_naptime    =3D 1s

Between every single run:
  1. PostgreSQL server is fully stopped (pg_ctl stop -m fast)
  2. OS page cache is dropped (echo 3 > /proc/sys/vm/drop_caches)
  3. Server is restarted with a clean log file
  4. After DELETE and CHECKPOINT, the server is stopped again, OS caches
     dropped again, and the server restarted -- so vacuum starts fully cold
  5. The autovacuum_max_parallel_workers GUC is reloaded via pg_ctl reload

Each configuration was tested for 5 iterations.

Timing is extracted from the PostgreSQL server log "system usage" line that
autovacuum emits at completion. This reports elapsed wall-clock time and CP=
U
time for the autovacuum worker leader process.


Results: Experiment A (cost_limit=3D200, cost_delay=3D2ms)
------------------------------------------------------

Workers  Iter1    Iter2    Iter3    Iter4    Iter5    Avg(s)  Speedup
-------  ------   ------   ------   ------   ------   ------  -------
0        66.21    79.11    66.27    77.11    66.30    71.00   1.00x
2        66.55    53.27    52.66    55.74    55.71    56.78   1.25x
4        51.50    51.74    65.07    52.06    70.25    58.12   1.22x
7        50.05    50.35    50.04    50.12    50.07    50.12   1.41x

CPU usage (leader process only):
Workers  Avg CPU user  Avg CPU sys
-------  -----------   ----------
0        3.04s         1.70s
2        1.24s         1.50s
4        0.78s         1.49s
7        0.79s         1.48s


Results: Experiment B (cost_limit=3D60, cost_delay=3D2ms)
-----------------------------------------------------

Workers  Iter1    Iter2    Iter3    Iter4    Iter5    Avg(s)  Speedup
-------  ------   ------   ------   ------   ------   ------  -------
0        199.00   195.26   191.44   191.90   191.67   193.85  1.00x
2        160.68   181.33   176.85   167.84   159.47   169.23  1.14x
4        154.02   165.02   174.33   164.16   156.53   162.81  1.19x
7        148.49   158.68   160.66   154.37   149.20   154.28  1.25x

CPU usage (leader process only):
Workers  Avg CPU user  Avg CPU sys
-------  -----------   ----------
0        3.06s         1.90s
2        1.28s         1.72s
4        0.80s         1.69s
7        0.82s         1.68s


*Observations:*

1. Parallel autovacuum provides consistent speedup. With cost_limit=3D200 a=
nd
   7 workers, vacuum completes 1.41x faster (71s -> 50s). With
cost_limit=3D60,
   the speedup is 1.25x (194s -> 154s).
2. I see the benefit comes from parallelizing index vacuum. With 8 indexes
totaling
   ~530 MB, parallel workers scan indexes concurrently instead of the leade=
r
   scanning them one by one. The leader's CPU user time drops from ~3s to
   ~0.8s as index work is offloaded


Thanks,
Satya

--000000000000f01fd9064e32c6e7
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Hi</div><br><div class=3D"gmail_quote gma=
il_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Sat, Mar 28, 2=
026 at 4:11=E2=80=AFAM Daniil Davydov &lt;<a href=3D"mailto:3danissimo@gmai=
l.com">3danissimo@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex">Hi,<br>
<br>
On Thu, Mar 26, 2026 at 5:43=E2=80=AFAM Masahiko Sawada &lt;<a href=3D"mail=
to:sawada.mshk@gmail.com" target=3D"_blank">sawada.mshk@gmail.com</a>&gt; w=
rote:<br>
&gt;<br>
&gt; On Wed, Mar 25, 2026 at 12:45=E2=80=AFAM Daniil Davydov &lt;<a href=3D=
"mailto:3danissimo@gmail.com" target=3D"_blank">3danissimo@gmail.com</a>&gt=
; wrote:<br>
&gt; &gt;<br>
&gt; &gt;=C2=A0 Searching for arguments in<br>
&gt; &gt; favor of opt-in style, I asked for help from another person who h=
as been<br>
&gt; &gt; managing the setup of highload systems for decades. He promised t=
o share his<br>
&gt; &gt; opinion next week.<br>
&gt;<br>
&gt; Given that we have one and half weeks before the feature freeze, I<br>
&gt; think it&#39;s better to complete the project first before waiting for=
<br>
&gt; his/her comments next week. Even if we finish this feature with the<br=
>
&gt; opt-out style, we can hear more opinions on it and change the default<=
br>
&gt; behavior as the change would be privial. What do you think?<br>
&gt;<br>
<br>
Sure, if we can change the default value after the feature freeze, I don=
9;t<br>
mind leaving our parameter in opt-out style by now.<br>
<br>
&gt; I&#39;ve squashed all patches except for the documentation patch as I<=
br>
&gt; assume you&#39;re working on it. The attached fixup patch contains sev=
eral<br>
&gt; changes: using opt-out style, comment improvements, and fixing typos<b=
r>
&gt; etc.<br>
&gt;<br>
<br>
Thank you very much for the proposed fixes!<br>
I like the way you have changed nparallel_workers calculation (autovacuum.c=
).<br>
Forcing parallel workers to always read shared cost params at the first tim=
e<br>
is a good decision. All comments changes are also LGTM.<br>
<br>
The only place that I have changed is reloptions.c :<br>
As you have explained, it is not appropriate to use the &quot;overrides&quo=
t; wording<br>
in the reloption&#39;s description, so I decided to return an old one.<br>
<br>
On Fri, Mar 27, 2026 at 10:54=E2=80=AFAM Bharath Rupireddy<br>
&lt;<a href=3D"mailto:bharath.rupireddyforpostgres@gmail.com" target=3D"_bl=
ank">bharath.rupireddyforpostgres@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi,<br>
&gt;<br>
&gt; On Wed, Mar 25, 2026 at 3:43=E2=80=AFPM Masahiko Sawada &lt;<a href=3D=
"mailto:sawada.mshk@gmail.com" target=3D"_blank">sawada.mshk@gmail.com</a>&=
gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; Given that we have one and half weeks before the feature freeze, =
I<br>
&gt; &gt; think it&#39;s better to complete the project first before waitin=
g for<br>
&gt; &gt; his/her comments next week. Even if we finish this feature with t=
he<br>
&gt; &gt; opt-out style, we can hear more opinions on it and change the def=
ault<br>
&gt; &gt; behavior as the change would be privial. What do you think?<br>
&gt; &gt;<br>
&gt; &gt; I&#39;ve squashed all patches except for the documentation patch =
as I<br>
&gt; &gt; assume you&#39;re working on it. The attached fixup patch contain=
s several<br>
&gt; &gt; changes: using opt-out style, comment improvements, and fixing ty=
pos<br>
&gt; &gt; etc.<br>
&gt;<br>
&gt; +1 for enabling this feature by default. When enough CPU is available,=
<br>
&gt; vacuuming multiple indexes of a table in parallel in autovacuum<br>
&gt; definitely speeds things up.<br>
<br>
Yes, for sure. But I have concerns that enabling parallel a/v for everyone<=
br>
will cause the parallel workers shortage during processing of the most huge=
<br>
tables.<br>
<br>
&gt; Thank you for sending the latest patches. I quickly reviewed the v31<b=
r>
&gt; patches. Here are some comments.<br>
&gt;<br>
&gt; 1/ +=C2=A0 =C2=A0 =C2=A0 =C2=A0{&quot;autovacuum_parallel_workers&quot=
;, RELOPT_TYPE_INT,<br>
&gt;<br>
&gt; I haven&#39;t looked at the whole thread, but do we all think we need =
this<br>
&gt; as a relopt? IMHO, we can wait for field experience and introduce this=
<br>
&gt; later.<br>
<br>
I think that we should leave both reloption and the config parameter.<br>
Getting rid from the reloption will greatly reduce the ability of users to<=
br>
tune this feature. I&#39;m afraid that this may lead to people not using pa=
rallel<br>
autovacuum.<br>
<br>
&gt; I&#39;m having a hard time finding a use-case where one wants to<br>
&gt; disable the indexes at the table level. If there was already an<br>
&gt; agreement, I agree to commit to that decision.<br>
<br>
You can read discussion from [1] to the current message in order to dive in=
to<br>
the question.<br>
<br>
To make the long story short, I think that the most common use case for thi=
s<br>
feature is allowing parallel a/v for 2-3 tables, each of which has ~100<br>
indexes. The rest of the tables do not require parallel processing (at leas=
t<br>
it&#39;s a much lower priority for them).<br>
<br>
At the same time, Masahiko-san thinks that only the system should decide wh=
ich<br>
tables will be processed in parallel. System&#39;s decision should be based=
 on the<br>
number of indexes and a few other config parameters (e.g.<br>
min_parallel_index_scan_size). Thus, possibly many tables will be able to b=
e<br>
processed in parallel.<br>
<br>
(Both opinions are pretty simplified).<br>
<br>
&gt;<br>
&gt; 2/=C2=A0 +=C2=A0 =C2=A0/*<br>
&gt; +=C2=A0 =C2=A0 * If &#39;true&#39; then we are running parallel autova=
cuum. Otherwise, we are<br>
&gt; +=C2=A0 =C2=A0 * running parallel maintenence VACUUM.<br>
&gt; +=C2=A0 =C2=A0 */<br>
&gt; +=C2=A0 =C2=A0bool=C2=A0 =C2=A0 =C2=A0 =C2=A0 is_autovacuum;<br>
&gt; +<br>
&gt;<br>
&gt; The variable name looks a bit confusing. How about we rely on<br>
&gt; AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?<br>
<br>
This variable is needed for parallel workers, which are taken from the<br>
bgworkers pool. I.e. AmAutovacuumWorker() will return &#39;false&#39; for t=
hem.<br>
We need the &quot;is_autovacuum&quot; variable in order to understand exact=
ly what this<br>
process was started for (VACUUM PARALLEL or parallel autovacuum).<br>
<br>
<br>
Thanks everyone for the review!<br>
Please, see an updated set of patches :<br>
As I promised, I created a dedicated chapter for Parallel Vacuum descriptio=
n.<br>
Both maintenance VACUUM and autovacuum now refer to this chapter.<br>
<br>
I am pretty inexperienced in the documentation writing, so forgive me if<br=
>
something is out of code style.<br>
<br>
[1] <a href=3D"https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55=
CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com" rel=3D"noreferrer" targ=
et=3D"_blank">https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55C=
GLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com</a></blockquote><div><br>=
</div><div>Thank you for working on this, very useful feature. Sharing a fe=
w thoughts:</div><div><br></div><div>1. Shouldn&#39;t we also cap by max_pa=
rallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_wo=
rkers?</div><div>2. Is it intentional that other autovacuum workers not yie=
ld cost limits to the parallel auto vacuum workers? Cost limits are distrib=
uted first equally to the autovacuum workers.</div><div>and then they share=
 that. Therefore, parallel workers will be heavily throttled. IIUC, this pr=
oblem doesn&#39;t exist with=C2=A0manual vacuum.</div><div>=C2=A0If we don&=
#39;t fix this, at least we should document this.</div><div>3. Additionally=
, is there a point where, based on the cost limits, launching additional wo=
rkers becomes counterproductive compared to running fewer workers and preve=
nting it?</div><div>4. Would it make sense to add a table level override to=
 disable parallelism or set parallel worker count?</div><div><br></div><div=
><br></div><div>I ran some perf tests to show the improvements with paralle=
l vacuum and shared below.</div><div>System Configuration<br>--------------=
------<br>Hardware:<br>=C2=A0 CPU: 16 cores<br>=C2=A0 RAM: 128 GB<br>=C2=A0=
 Storage: NVMe SSDs<br>=C2=A0 OS: Ubuntu Linux<br><br>Workload Description<=
br>--------------------<br>Table: avtest<br>=C2=A0 - 5,000,000 rows<br>=C2=
=A0 - 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7 (timest=
amp),<br>=C2=A0 =C2=A0 padding (text, 50 bytes)<br>=C2=A0 - 8 indexes:<br>=
=C2=A0 =C2=A0 =C2=A0 avtest_pkey =C2=A0(col: id) =C2=A0 =C2=A0 =C2=A0 =C2=
=A0107 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_col7 =C2=A0(col: col7) =C2=A0 =C2=
=A0 =C2=A0107 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_col2 =C2=A0(col: col2) =C2=
=A0 =C2=A0 =C2=A0 56 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_col4 =C2=A0(col: col=
4) =C2=A0 =C2=A0 =C2=A0 56 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_col5 =C2=A0(co=
l: col5) =C2=A0 =C2=A0 =C2=A0 56 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_col1 =C2=
=A0(col: col1) =C2=A0 =C2=A0 =C2=A0 56 MB<br>=C2=A0 =C2=A0 =C2=A0 idx_av_co=
l3 =C2=A0(col: col3) =C2=A0 =C2=A0 =C2=A0 56 MB<br>=C2=A0 =C2=A0 =C2=A0 idx=
_av_col6 =C2=A0(col: col6) =C2=A0 =C2=A0 =C2=A0 35 MB<br>=C2=A0 - Total siz=
e: 1171 MB<br><br>Each test iteration:<br>=C2=A0 1. Delete 2,000,000 rows (=
40%) using: DELETE WHERE id % 5 IN (1, 2)<br>=C2=A0 2. CHECKPOINT to flush =
dirty pages<br>=C2=A0 3. Trigger autovacuum by setting autovacuum_vacuum_th=
reshold =3D 100 and<br>=C2=A0 =C2=A0 =C2=A0autovacuum_vacuum_scale_factor =
=3D 0 on the table<br>=C2=A0 4. Wait for autovacuum to complete (detected v=
ia server log)<br>=C2=A0 5. Re-insert the deleted rows and VACUUM to restor=
e the table for the next run<br><br><br>Test Methodology<br>---------------=
-<br>Worker configurations tested: 0, 2, 4, 7 parallel workers<br>=C2=A0 (7=
 is the maximum: nindexes - 1, since the leader always handles one index)<b=
r><br>Two experiments were run with different cost-based vacuum delay setti=
ngs:<br><br>=C2=A0 Experiment A: cost_limit=3D200, cost_delay=3D2ms<br>=C2=
=A0 Experiment B: cost_limit=3D60, =C2=A0cost_delay=3D2ms<br><br>Common ser=
ver settings for both experiments:<br>=C2=A0 shared_buffers =C2=A0 =C2=A0 =
=C2=A0 =C2=A0=3D 120 GB =C2=A0(entire dataset fits in shared buffers)<br>=
=C2=A0 maintenance_work_mem =C2=A0=3D 1 GB<br>=C2=A0 max_wal_size =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 100 GB =C2=A0(prevents checkpoints during va=
cuum)<br>=C2=A0 min_wal_size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 10 GB<br=
>=C2=A0 checkpoint_timeout =C2=A0 =C2=A0=3D 1 hour =C2=A0(prevents time-bas=
ed checkpoints)<br>=C2=A0 wal_buffers =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=3D 128 MB<br>=C2=A0 max_parallel_workers =C2=A0=3D 16<br>=C2=A0 max_worker=
_processes =C2=A0=3D 24<br>=C2=A0 autovacuum_naptime =C2=A0 =C2=A0=3D 1s<br=
><br>Between every single run:<br>=C2=A0 1. PostgreSQL server is fully stop=
ped (pg_ctl stop -m fast)<br>=C2=A0 2. OS page cache is dropped (echo 3 &gt=
; /proc/sys/vm/drop_caches)<br>=C2=A0 3. Server is restarted with a clean l=
og file<br>=C2=A0 4. After DELETE and CHECKPOINT, the server is stopped aga=
in, OS caches<br>=C2=A0 =C2=A0 =C2=A0dropped again, and the server restarte=
d -- so vacuum starts fully cold<br>=C2=A0 5. The autovacuum_max_parallel_w=
orkers GUC is reloaded via pg_ctl reload<br><br>Each configuration was test=
ed for 5 iterations.<br><br>Timing is extracted from the PostgreSQL server =
log &quot;system usage&quot; line that<br>autovacuum emits at completion. T=
his reports elapsed wall-clock time and CPU<br>time for the autovacuum work=
er leader process.<br><br><br>Results: Experiment A (cost_limit=3D200, cost=
_delay=3D2ms)<br>------------------------------------------------------<br>=
<br>Workers =C2=A0Iter1 =C2=A0 =C2=A0Iter2 =C2=A0 =C2=A0Iter3 =C2=A0 =C2=A0=
Iter4 =C2=A0 =C2=A0Iter5 =C2=A0 =C2=A0Avg(s) =C2=A0Speedup<br>------- =C2=
=A0------ =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0 --=
---- =C2=A0-------<br>0 =C2=A0 =C2=A0 =C2=A0 =C2=A066.21 =C2=A0 =C2=A079.11=
 =C2=A0 =C2=A066.27 =C2=A0 =C2=A077.11 =C2=A0 =C2=A066.30 =C2=A0 =C2=A071.0=
0 =C2=A0 1.00x<br>2 =C2=A0 =C2=A0 =C2=A0 =C2=A066.55 =C2=A0 =C2=A053.27 =C2=
=A0 =C2=A052.66 =C2=A0 =C2=A055.74 =C2=A0 =C2=A055.71 =C2=A0 =C2=A056.78 =
=C2=A0 1.25x<br>4 =C2=A0 =C2=A0 =C2=A0 =C2=A051.50 =C2=A0 =C2=A051.74 =C2=
=A0 =C2=A065.07 =C2=A0 =C2=A052.06 =C2=A0 =C2=A070.25 =C2=A0 =C2=A058.12 =
=C2=A0 1.22x<br>7 =C2=A0 =C2=A0 =C2=A0 =C2=A050.05 =C2=A0 =C2=A050.35 =C2=
=A0 =C2=A050.04 =C2=A0 =C2=A050.12 =C2=A0 =C2=A050.07 =C2=A0 =C2=A050.12 =
=C2=A0 1.41x<br><br>CPU usage (leader process only):<br>Workers =C2=A0Avg C=
PU user =C2=A0Avg CPU sys<br>------- =C2=A0----------- =C2=A0 ----------<br=
>0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.04s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.70s<br>2 =
=C2=A0 =C2=A0 =C2=A0 =C2=A01.24s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.50s<br>4 =C2=
=A0 =C2=A0 =C2=A0 =C2=A00.78s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.49s<br>7 =C2=A0=
 =C2=A0 =C2=A0 =C2=A00.79s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.48s<br><br><br>Res=
ults: Experiment B (cost_limit=3D60, cost_delay=3D2ms)<br>-----------------=
------------------------------------<br><br>Workers =C2=A0Iter1 =C2=A0 =C2=
=A0Iter2 =C2=A0 =C2=A0Iter3 =C2=A0 =C2=A0Iter4 =C2=A0 =C2=A0Iter5 =C2=A0 =
=C2=A0Avg(s) =C2=A0Speedup<br>------- =C2=A0------ =C2=A0 ------ =C2=A0 ---=
--- =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0-------<br>0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0199.00 =C2=A0 195.26 =C2=A0 191.44 =C2=A0 191.90 =C2=A0 19=
1.67 =C2=A0 193.85 =C2=A01.00x<br>2 =C2=A0 =C2=A0 =C2=A0 =C2=A0160.68 =C2=
=A0 181.33 =C2=A0 176.85 =C2=A0 167.84 =C2=A0 159.47 =C2=A0 169.23 =C2=A01.=
14x<br>4 =C2=A0 =C2=A0 =C2=A0 =C2=A0154.02 =C2=A0 165.02 =C2=A0 174.33 =C2=
=A0 164.16 =C2=A0 156.53 =C2=A0 162.81 =C2=A01.19x<br>7 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0148.49 =C2=A0 158.68 =C2=A0 160.66 =C2=A0 154.37 =C2=A0 149.20 =
=C2=A0 154.28 =C2=A01.25x<br><br>CPU usage (leader process only):<br>Worker=
s =C2=A0Avg CPU user =C2=A0Avg CPU sys<br>------- =C2=A0----------- =C2=A0 =
----------<br>0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.06s =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 1.90s<br>2 =C2=A0 =C2=A0 =C2=A0 =C2=A01.28s =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 1.72s<br>4 =C2=A0 =C2=A0 =C2=A0 =C2=A00.80s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.=
69s<br>7 =C2=A0 =C2=A0 =C2=A0 =C2=A00.82s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.68s=
<br><br><br><br><b>Observations:</b><br><br>1. Parallel autovacuum provides=
 consistent speedup. With cost_limit=3D200 and<br>=C2=A0 =C2=A07 workers, v=
acuum completes 1.41x faster (71s -&gt; 50s). With cost_limit=3D60,<br>=C2=
=A0 =C2=A0the speedup is 1.25x (194s -&gt; 154s).<br>2. I see the benefit c=
omes from parallelizing index vacuum. With 8 indexes totaling<br>=C2=A0 =C2=
=A0~530 MB, parallel workers scan indexes concurrently instead of the leade=
r<br>=C2=A0 =C2=A0scanning them one by one. The leader&#39;s CPU user time =
drops from ~3s to<br>=C2=A0 =C2=A0~0.8s as index work is offloaded</div><di=
v><br></div><div><br></div><div><br></div><div><br></div><div><br></div><di=
v>Thanks,</div><div>Satya</div><div><br></div><div><br></div><div><br></div=
></div></div>

--000000000000f01fd9064e32c6e7--