MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
 <CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
 <CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
 <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
 <CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
 <CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
 <CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
 <CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
 <CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
 <CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
 <CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
 <CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
 <CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
 <CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
 <CAJDiXghmsbTmnm--9B5bbuZXa1OL7SZ0HYppX3tx9XsdwfJBhA@mail.gmail.com>
 <DB3C67FCRLOO.1R5NLYCNEA6BF@gmail.com>
 <CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com>
 <CAD21AoDtPpkkQ_h1yf4oTx1qn4SRdTeVY3qs+9J07fYqa_4Gww@mail.gmail.com>
 <CAJDiXgi7KB7wSQ=Ux=ngdaCvJnJ5x-ehvTyiuZez+5uKHtV6iQ@mail.gmail.com>
 <CAD21AoCcHKKXsr9Oh736ejckqqS1i430xGEyJ=JP5OL0ExyP1A@mail.gmail.com>
 <CAJDiXghaFT_1sSv3q8mjyZ_RLZDgiogg0mWRvLxSWvkUi2CcLg@mail.gmail.com>
 <CAA5RZ0u63W41OmcEO+HLs4CSo-Sd3J+Q-4=04iud8V=xX4iUrA@mail.gmail.com>
 <CAJDiXgin1TXniVGJKzOTA=F9K342uVfm6O0EmubTVB=F+XSrbA@mail.gmail.com>
 <CAD21AoDadzAwibxf-+urjx=XL+eVu8=Ut-Lh2GxXUt32LbPG3Q@mail.gmail.com>
 <CAD21AoD6HhraqhOgkQJOrr0ixZkAZuqJRpzGv-B+_-ad6d5aPw@mail.gmail.com>
 <CAJDiXgiGSpqMQSOx-cVO_LtcB5GWHBy9ph7oOR4ebbX8A==kgw@mail.gmail.com>
 <CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com>
 <CAPpHfduBJfMcojvmYHUo8b_C=0cxRy1N+tNiNGoA3RAZq2ApaA@mail.gmail.com>
 <CAD21AoC82NeHKXc965pPUZO2eyo1U7P6cmfRJbrcPDcnd7_6hw@mail.gmail.com>
 <CAJDiXghP2kXnEz+cj3rAWNM3NdKSB_4WtnngFXpVz2omPhGr5A@mail.gmail.com>
 <CAD21AoA0bnRZC_OqKMnH-Ln+OZ9z9k56j2c_MXj8pw69O-wkBw@mail.gmail.com>
 <CAA5RZ0sSXDza7_nUUbhHL_Sws+M+HR1daKJPXHpdLuNCkwUgUg@mail.gmail.com>
 <CAJDiXggrBsbzOisf+Nu8pZkYGrpUZaFbosL1Wbm3kKxzTm4xgw@mail.gmail.com>
 <CAA5RZ0tbiPcgQEjnhdnjz6qSjfRsGrr8jGCaMcrMaoPpax3wig@mail.gmail.com>
 <CAJDiXgjt5ZmK2uvS0E8Ztt5ePYmq8Ze_dG05Zo2NUsKLHCEuYA@mail.gmail.com>
 <CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com>
 <CAJDiXghwtUbiFnAh3nSaxTk8KFupQuMbp+g4z3wOLoQfMuqgDg@mail.gmail.com>
 <CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>
 <CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com>
 <CAJDiXggL=J0nV7PfBsMW9+UOU3KUp1jNBM9Gov1JvAX7aG_U1g@mail.gmail.com>
 <CAD21AoDz-1Zf9DOJJrdcB2=eNA4UdywthkowNp_dHmOGC-yV_g@mail.gmail.com>
 <CAJDiXgjzphJ313=aDwbvryHpmTi6AqE+-5crysTtzKv01-vkzA@mail.gmail.com>
 <CAD21AoD7_4gsQ2a82zO3SaRwjdw_3tyiYDHNFPUKQ5DAA5HOtA@mail.gmail.com>
 <CAJDiXggY1QzNde6_HhpzneLc9dYqmWZ+PY39cuBXYdcCTuoJBA@mail.gmail.com>
 <CAD21AoCFPiS2jcMA1JaV1kT8xrGz5BpN7iBP_gCgRuaANEbciA@mail.gmail.com>
 <CAJDiXgh6jmNGR3uOB_6YeGhNkR2=HdTdEYjmHXdumNzyY4MckQ@mail.gmail.com>
 <CAD21AoDs3SOXeAEoCRizfEKybpRkE7t7poX0+iZ6MM1MFWMsfA@mail.gmail.com>
 <CAJDiXgjTkuqSPerC_nasxDz6d2Komf1ipYKV6SupDRnc9yhO9w@mail.gmail.com>
 <CAD21AoAXMjX03h5K84u0heBLU+fqGgWBGBDwnBDGSs=DhyF9pQ@mail.gmail.com>
 <CAJDiXghjZEAYboGhujgGvY9=RiFD01ERHVVF+NQMuuAKVZDmDQ@mail.gmail.com>
 <CAD21AoAD+N5SxBr0qL7TeWnvq4iYmFT=DyWdNLQPB-XntYkwEg@mail.gmail.com>
 <CAJDiXgjgn87sH1-MmONPKkeYJG83C0ChrYkYn9UcRonLhOOfOw@mail.gmail.com>
 <CAD21AoCoJYauWO78M-CGdHpYfcqEZVV5a1Z-7wWB=-G-x8EVFg@mail.gmail.com>
 <CAJDiXghaazbrQMZZS08d9Ffh2y4w05TgH9dpBhqChv1qNTp+xA@mail.gmail.com>
 <CAD21AoDbaNtLrFRxG9OG5WrBd7DCs4q+CfJd8AJTBEqRri4WeQ@mail.gmail.com>
 <CAJDiXgjjd1jL86B--AyRo2tDM1Wiu+7Pduwh5d0u_UM8GRugvw@mail.gmail.com>
 <CAD21AoBo_wS7y0X7_7ajEFkptzo9ZrF8RFNRnu2Xe8XL74o0SQ@mail.gmail.com>
 <CAJDiXggH1bW=4n+55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw@mail.gmail.com>
 <CAD21AoDxhN8Z6Lx1ZicBXKkbMsRQqEXiq4ALs4uaD648iSvXoA@mail.gmail.com>
 <CAJDiXgh3Dg2f5k3xRJnzoY39jQENUhh125ArYapXkSu5D7JJuw@mail.gmail.com>
 <CAD21AoBYc7L7W4dRdxeoJzOH5OgpiCAtKz-54iX4Ufn8PnQoww@mail.gmail.com>
 <CAJDiXgi8X-DMb92v5WHLCNxDHxH9gO8WQxOMtdpmU7X=WXCiuQ@mail.gmail.com>
 <CAD21AoDKxs0UrTwa3rkP+kE9AzccabpK7G-Tk=HYneaFTZBtiA@mail.gmail.com>
 <CALj2ACUJ0TtYWtFuXXVf0aLES8tfZePXnB8WQ=0KCrNaABzQVg@mail.gmail.com>
 <CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com>
 <CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com>
 <CAJDiXgi73x7h0=UoXriFjskRB6htZ-uqXKqvWN3RefuxbP93gA@mail.gmail.com>
 <CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com>
In-Reply-To: 
 <CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com>
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Apr 2026 11:54:26 -0700
Message-ID: 
 <CAD21AoAwcevLebQO6+MuWoOi9XNrNMg7gubFbLao2QkrRbOfMQ@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Cc: Daniil Davydov <3danissimo@gmail.com>,
	Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>,
 Sami Imseih <samimseih@gmail.com>,
	Alexander Korotkov <aekorotkov@gmail.com>,
 Matheus Alcantara <matheusssilv97@gmail.com>,
	Maxim Orlov <orlovmg@gmail.com>,
 Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAD21AoAwcevLebQO6%2BMuWoOi9XNrNMg7gubFbLao2QkrRbOfMQ%40mail.gmail.com>
Precedence: bulk

On Mon, Mar 30, 2026 at 5:14=E2=80=AFPM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Hi
>
> On Mon, Mar 30, 2026 at 1:44=E2=80=AFAM Daniil Davydov <3danissimo@gmail.=
com> wrote:
>>
>> Hi,
>>
>> On Mon, Mar 30, 2026 at 7:17=E2=80=AFAM SATYANARAYANA NARLAPURAM
>> <satyanarlapuram@gmail.com> wrote:
>> >
>> > Thank you for working on this, very useful feature. Sharing a few thou=
ghts:
>> >
>> > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM =
resources in parallel_vacuum_compute_workers?
>>
>> Actually, autovacuum_max_parallel_workers is already limited by
>> max_parallel_workers. It is not clear for me why we allow setting this G=
UC
>> higher than max_parallel_workers, but if this happens, I think it is a u=
ser's
>> misconfiguration.
>>
>> > 2. Is it intentional that other autovacuum workers not yield cost limi=
ts to the parallel auto vacuum workers? Cost limits are distributed first e=
qually to the autovacuum workers.
>> > and then they share that. Therefore, parallel workers will be heavily =
throttled. IIUC, this problem doesn't exist with manual vacuum.
>> >  If we don't fix this, at least we should document this.
>>
>> Parallel a/v workers inherit cost based parameters (including the
>> vacuum_cost_limit) from the leader worker. Do you mean that this can be =
too
>> low value for parallel operation? If so, user can manually increase the
>> vacuum_cost_limit reloption for those tables, where parallel a/v sleeps =
too
>> much (due to cost delay).
>>
>> BTW, describing the cost limit propagation to the parallel a/v workers i=
s
>> worth mentioning in the documentation. I'll add it in the next patch ver=
sion.
>>
>> > 3. Additionally, is there a point where, based on the cost limits, lau=
nching additional workers becomes counterproductive compared to running few=
er workers and preventing it?
>>
>> I don't think that we can possibly find a universal limit that will be
>> appropriate for all possible configurations. By now we are using a prett=
y
>> simple formula for parallel degree calculation. Since user have several =
ways
>> to affect this formula, I guess that there will be no problems with it (=
except
>> my concerns about opt-out style).
>>
>> > 4. Would it make sense to add a table level override to disable parall=
elism or set parallel worker count?
>>
>> We already have the "autovacuum_parallel_workers" reloption that is used=
 as
>> an additional limit for the number of parallel workers. In particular, t=
his
>> reloption can be used to disable parallelism at all.
>>
>> >
>> > I ran some perf tests to show the improvements with parallel vacuum an=
d shared below.
>>
>> Thank you very much!
>>
>> > Observations:
>> >
>> > 1. Parallel autovacuum provides consistent speedup. With cost_limit=3D=
200 and
>> >    7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_li=
mit=3D60,
>> >    the speedup is 1.25x (194s -> 154s).
>> > 2. I see the benefit comes from parallelizing index vacuum. With 8 ind=
exes totaling
>> >    ~530 MB, parallel workers scan indexes concurrently instead of the =
leader
>> >    scanning them one by one. The leader's CPU user time drops from ~3s=
 to
>> >    ~0.8s as index work is offloaded
>> >
>>
>> 1.41 speedup with 7 parallel workers may not seem like a great win, but =
it is
>> a whole time of autovacuum operation (not only index bulkdel/cleanup) wi=
th
>> pretty small indexes.
>>
>> May I ask you to run the same test with a higher table's size (several d=
ozen
>> gigabytes)? I think the results will be more "expressive".
>
>
> I ran it with a Billion rows in a table with 8 indexes. The improvement w=
ith 7 workers is 1.8x.
> Please note that there is a fixed overhead in other vacuum steps, for exa=
mple heap scan.
> In the environments where cost-based delay is used (the default), benefit=
s will be modest
> unless vacuum_cost_delay is set to sufficiently large value.
>
> Hardware:
>   CPU:     Intel Xeon Platinum 8573C, 1 socket =C3=97 8 cores =C3=97 2 th=
reads =3D 16 vCPUs
>   RAM:     128 GB (131,900 MB)
>   Swap:    None
>
> Workload Description
>
> Table Schema:
>   CREATE TABLE avtest (
>       id       bigint PRIMARY KEY,
>       col1     int,           -- random()*1e9
>       col2     int,           -- random()*1e9
>       col3     int,           -- random()*1e9
>       col4     int,           -- random()*1e9
>       col5     int,           -- random()*1e9
>       col6     text,          -- 'text_' || random()*1e6  (short text ~10=
 chars)
>       col7     timestamp,     -- now() - random()*365 days
>       padding  text           -- repeat('x', 50)
>   ) WITH (fillfactor =3D 90);
>
> Indexes (8 total):
>   avtest_pkey   =E2=80=94 btree on (id)        bigint
>   idx_av_col1   =E2=80=94 btree on (col1)      int
>   idx_av_col2   =E2=80=94 btree on (col2)      int
>   idx_av_col3   =E2=80=94 btree on (col3)      int
>   idx_av_col4   =E2=80=94 btree on (col4)      int
>   idx_av_col5   =E2=80=94 btree on (col5)      int
>   idx_av_col6   =E2=80=94 btree on (col6)      text
>   idx_av_col7   =E2=80=94 btree on (col7)      timestamp
>
> Dead Tuple Generation:
>   DELETE FROM avtest WHERE id % 5 IN (1, 2);
>   This deletes exactly 40% of rows, uniformly distributed across all page=
s.
>
> Vacuum Trigger:
>   Autovacuum is triggered naturally by lowering the threshold to 0 and se=
tting
>   scale_factor to a value that causes immediate launch after the DELETE.
>
> Worker Configurations Tested:
>   0 workers  =E2=80=94 leader-only vacuum (baseline, no parallelism)
>   2 workers  =E2=80=94 leader + 2 parallel workers (3 processes total)
>   4 workers  =E2=80=94 leader + 4 parallel workers (5 processes total)
>   7 workers  =E2=80=94 leader + 7 parallel workers (8 processes total, 1 =
per index)
>
> Dataset:
>   Rows:         1,000,000,000
>   Heap size:    139 GB
>   Total size:   279 GB (heap + 8 indexes)
>   Dead tuples:  400,000,000 (40%)
>
> Index Sizes:
>   avtest_pkey    21 GB   (bigint)
>   idx_av_col7    21 GB   (timestamp)
>   idx_av_col1    18 GB   (int)
>   idx_av_col2    18 GB   (int)
>   idx_av_col3    18 GB   (int)
>   idx_av_col4    18 GB   (int)
>   idx_av_col5    18 GB   (int)
>   idx_av_col6     7 GB   (text =E2=80=94 shorter keys, smaller index)
>   Total indexes: 139 GB
>
> Server Settings:
>   shared_buffers                =3D 96GB
>   maintenance_work_mem          =3D 1GB
>   max_wal_size                  =3D 100GB
>   checkpoint_timeout            =3D 1h
>   autovacuum_vacuum_cost_delay  =3D 0ms (NO throttling)
>   autovacuum_vacuum_cost_limit  =3D 1000
>
>
> Summary:
>
> Workers  Avg(s)    Min(s)    Max(s)    Speedup   Time Saved
> -------  ------    ------    ------    -------   ----------
> 0        1645.93   1645.01   1646.84    1.00x          =E2=80=94
> 2        1276.35   1275.64   1277.05    1.29x     369.58s (6.2 min)
> 4        1052.62   1048.92   1056.32    1.56x     593.31s (9.9 min)
> 7         892.23    886.59    897.86    1.84x     753.70s (12.6 min)
>

Thank you for sharing the performance test results!

While the benchmark results look good to me, have you compared the
performance differences between parallel vacuum in the VACUUM command
(with the PARALLEL option) and parallel vacuum in autovacuum? Since
parallel autovacuum introduces some logic to check for delay parameter
updates, I thought it was worth verifying if this adds any overhead.

BTW, in my view, the most challenging part of this patch is the
propagation logic for vacuum delay parameters. This propagation is
necessary because, unlike manual VACUUM, autovacuum workers can reload
their configuration during operation. We must ensure that parallel
workers stay synchronized with these updated parameters.

The current patch implements this in vacuumparallel.c: the leader
shares delay parameters in DSM and updates them (if any vacuum delay
parameters are updated) after a config reload, while workers poll for
updates at every vacuum_delay_point() call to refresh their local
variables.

Another possible approach would be an event-driven model where the
leader notifies workers after updating shared parameters=E2=80=94for exampl=
e,
by adding a shm_mq between the leader (as the sender) and each worker
(as the receiver).

I've compared these two ideas and opted for the former (polling).
While a polling approach could theoretically be costly, the current
implementation is self-contained within the parallel vacuum logic and
does not touch the core parallel query infrastructure. The
notification approach might look more elegant, but I'm concerned it
adds unnecessary complexity just for the autovacuum case. Since the
polling is essentially just checking an atomic variable, the overhead
should be negligible.

To verify this, I conducted benchmarks comparing the whole execution
time and index vacuuming duration.

Setup:

- Disabled (auto) vacuum delays and buffer usage limits.
- Parallel autovacuum with 1 worker on a table with 2 indexes (approx.
4 GB each).
- 5 runs.

Case 1: The latest patch (with polling)

Average: 3.95s (Index: 1.54s)
Median: 3.62s (Index: 1.37s)

Case 2: The latest patch without polling

Average: 3.98s (Index: 1.56s)
Median: 3.70s (Index: 1.40s)

Note that in order to simulate the code that doesn't have the polling,
I reverted the following change:

-   if (InterruptPending ||
-       (!VacuumCostActive && !ConfigReloadPending))
+   if (InterruptPending)
+       return;
+
+   if (IsParallelWorker())
+   {
+       /*
+        * Update cost-based vacuum delay parameters for a parallel autovac=
uum
+        * worker if any changes are detected.
+        */
+       parallel_vacuum_update_shared_delay_params();
+   }
+
+   if (!VacuumCostActive && !ConfigReloadPending)

The parallel vacuum workers don't check the shared vacuum delay
parameter at all, which is still fine as I disabled vacuum delays.

Overall, the results show no noticeable overhead from the polling approach.

Regards,

--=20
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com