MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
 <CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
 <CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
 <CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
 <CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
 <CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
 <CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
 <CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
 <CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
 <CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
 <CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
 <CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
 <CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
 <CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
 <CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
 <CAJDiXghmsbTmnm--9B5bbuZXa1OL7SZ0HYppX3tx9XsdwfJBhA@mail.gmail.com>
 <DB3C67FCRLOO.1R5NLYCNEA6BF@gmail.com>
 <CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com>
 <CAD21AoDtPpkkQ_h1yf4oTx1qn4SRdTeVY3qs+9J07fYqa_4Gww@mail.gmail.com>
 <CAJDiXgi7KB7wSQ=Ux=ngdaCvJnJ5x-ehvTyiuZez+5uKHtV6iQ@mail.gmail.com>
 <CAD21AoCcHKKXsr9Oh736ejckqqS1i430xGEyJ=JP5OL0ExyP1A@mail.gmail.com>
 <CAJDiXghaFT_1sSv3q8mjyZ_RLZDgiogg0mWRvLxSWvkUi2CcLg@mail.gmail.com>
 <CAA5RZ0u63W41OmcEO+HLs4CSo-Sd3J+Q-4=04iud8V=xX4iUrA@mail.gmail.com>
 <CAJDiXgin1TXniVGJKzOTA=F9K342uVfm6O0EmubTVB=F+XSrbA@mail.gmail.com>
 <CAD21AoDadzAwibxf-+urjx=XL+eVu8=Ut-Lh2GxXUt32LbPG3Q@mail.gmail.com>
 <CAD21AoD6HhraqhOgkQJOrr0ixZkAZuqJRpzGv-B+_-ad6d5aPw@mail.gmail.com>
 <CAJDiXgiGSpqMQSOx-cVO_LtcB5GWHBy9ph7oOR4ebbX8A==kgw@mail.gmail.com>
 <CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com>
 <CAPpHfduBJfMcojvmYHUo8b_C=0cxRy1N+tNiNGoA3RAZq2ApaA@mail.gmail.com>
 <CAD21AoC82NeHKXc965pPUZO2eyo1U7P6cmfRJbrcPDcnd7_6hw@mail.gmail.com>
 <CAJDiXghP2kXnEz+cj3rAWNM3NdKSB_4WtnngFXpVz2omPhGr5A@mail.gmail.com>
 <CAD21AoA0bnRZC_OqKMnH-Ln+OZ9z9k56j2c_MXj8pw69O-wkBw@mail.gmail.com>
 <CAA5RZ0sSXDza7_nUUbhHL_Sws+M+HR1daKJPXHpdLuNCkwUgUg@mail.gmail.com>
 <CAJDiXggrBsbzOisf+Nu8pZkYGrpUZaFbosL1Wbm3kKxzTm4xgw@mail.gmail.com>
 <CAA5RZ0tbiPcgQEjnhdnjz6qSjfRsGrr8jGCaMcrMaoPpax3wig@mail.gmail.com>
 <CAJDiXgjt5ZmK2uvS0E8Ztt5ePYmq8Ze_dG05Zo2NUsKLHCEuYA@mail.gmail.com>
 <CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com>
 <CAJDiXghwtUbiFnAh3nSaxTk8KFupQuMbp+g4z3wOLoQfMuqgDg@mail.gmail.com>
 <CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>
 <CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com>
 <CAJDiXggL=J0nV7PfBsMW9+UOU3KUp1jNBM9Gov1JvAX7aG_U1g@mail.gmail.com>
 <CAD21AoDz-1Zf9DOJJrdcB2=eNA4UdywthkowNp_dHmOGC-yV_g@mail.gmail.com>
 <CAJDiXgjzphJ313=aDwbvryHpmTi6AqE+-5crysTtzKv01-vkzA@mail.gmail.com>
 <CAD21AoD7_4gsQ2a82zO3SaRwjdw_3tyiYDHNFPUKQ5DAA5HOtA@mail.gmail.com>
 <CAJDiXggY1QzNde6_HhpzneLc9dYqmWZ+PY39cuBXYdcCTuoJBA@mail.gmail.com>
 <CAD21AoCFPiS2jcMA1JaV1kT8xrGz5BpN7iBP_gCgRuaANEbciA@mail.gmail.com>
 <CAJDiXgh6jmNGR3uOB_6YeGhNkR2=HdTdEYjmHXdumNzyY4MckQ@mail.gmail.com>
 <CAD21AoDs3SOXeAEoCRizfEKybpRkE7t7poX0+iZ6MM1MFWMsfA@mail.gmail.com>
 <CAJDiXgjTkuqSPerC_nasxDz6d2Komf1ipYKV6SupDRnc9yhO9w@mail.gmail.com>
 <CAD21AoAXMjX03h5K84u0heBLU+fqGgWBGBDwnBDGSs=DhyF9pQ@mail.gmail.com>
 <CAJDiXghjZEAYboGhujgGvY9=RiFD01ERHVVF+NQMuuAKVZDmDQ@mail.gmail.com>
 <CAD21AoAD+N5SxBr0qL7TeWnvq4iYmFT=DyWdNLQPB-XntYkwEg@mail.gmail.com>
 <CAJDiXgjgn87sH1-MmONPKkeYJG83C0ChrYkYn9UcRonLhOOfOw@mail.gmail.com>
 <CAD21AoCoJYauWO78M-CGdHpYfcqEZVV5a1Z-7wWB=-G-x8EVFg@mail.gmail.com>
 <CAJDiXghaazbrQMZZS08d9Ffh2y4w05TgH9dpBhqChv1qNTp+xA@mail.gmail.com>
 <CAD21AoDbaNtLrFRxG9OG5WrBd7DCs4q+CfJd8AJTBEqRri4WeQ@mail.gmail.com>
 <CAJDiXgjjd1jL86B--AyRo2tDM1Wiu+7Pduwh5d0u_UM8GRugvw@mail.gmail.com>
 <CAD21AoBo_wS7y0X7_7ajEFkptzo9ZrF8RFNRnu2Xe8XL74o0SQ@mail.gmail.com>
 <CAJDiXggH1bW=4n+55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw@mail.gmail.com>
 <CAD21AoDxhN8Z6Lx1ZicBXKkbMsRQqEXiq4ALs4uaD648iSvXoA@mail.gmail.com>
 <CAJDiXgh3Dg2f5k3xRJnzoY39jQENUhh125ArYapXkSu5D7JJuw@mail.gmail.com>
 <CAD21AoBYc7L7W4dRdxeoJzOH5OgpiCAtKz-54iX4Ufn8PnQoww@mail.gmail.com>
 <CAJDiXgi8X-DMb92v5WHLCNxDHxH9gO8WQxOMtdpmU7X=WXCiuQ@mail.gmail.com>
 <CAD21AoDKxs0UrTwa3rkP+kE9AzccabpK7G-Tk=HYneaFTZBtiA@mail.gmail.com>
 <CALj2ACUJ0TtYWtFuXXVf0aLES8tfZePXnB8WQ=0KCrNaABzQVg@mail.gmail.com>
 <CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com>
 <CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com>
 <CAJDiXgi73x7h0=UoXriFjskRB6htZ-uqXKqvWN3RefuxbP93gA@mail.gmail.com>
 <CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com>
 <CAD21AoAwcevLebQO6+MuWoOi9XNrNMg7gubFbLao2QkrRbOfMQ@mail.gmail.com>
In-Reply-To: 
 <CAD21AoAwcevLebQO6+MuWoOi9XNrNMg7gubFbLao2QkrRbOfMQ@mail.gmail.com>
From: Alexander Korotkov <aekorotkov@gmail.com>
Date: Thu, 2 Apr 2026 14:02:40 +0300
Message-ID: 
 <CAPpHfduXzE7OeP0QgPEBhG8-4xg=wGgjoJi9c6-8kN9Fyji96g@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>,
 Daniil Davydov <3danissimo@gmail.com>,
	Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>,
 Sami Imseih <samimseih@gmail.com>,
	Matheus Alcantara <matheusssilv97@gmail.com>,
 Maxim Orlov <orlovmg@gmail.com>,
	Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAPpHfduXzE7OeP0QgPEBhG8-4xg%3DwGgjoJi9c6-8kN9Fyji96g%40mail.gmail.com>
Precedence: bulk

Hi!

On Wed, Apr 1, 2026 at 9:55=E2=80=AFPM Masahiko Sawada <sawada.mshk@gmail.c=
om> wrote:
>
> On Mon, Mar 30, 2026 at 5:14=E2=80=AFPM SATYANARAYANA NARLAPURAM
> <satyanarlapuram@gmail.com> wrote:
> >
> > Hi
> >
> > On Mon, Mar 30, 2026 at 1:44=E2=80=AFAM Daniil Davydov <3danissimo@gmai=
l.com> wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Mar 30, 2026 at 7:17=E2=80=AFAM SATYANARAYANA NARLAPURAM
> >> <satyanarlapuram@gmail.com> wrote:
> >> >
> >> > Thank you for working on this, very useful feature. Sharing a few th=
oughts:
> >> >
> >> > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DS=
M resources in parallel_vacuum_compute_workers?
> >>
> >> Actually, autovacuum_max_parallel_workers is already limited by
> >> max_parallel_workers. It is not clear for me why we allow setting this=
 GUC
> >> higher than max_parallel_workers, but if this happens, I think it is a=
 user's
> >> misconfiguration.
> >>
> >> > 2. Is it intentional that other autovacuum workers not yield cost li=
mits to the parallel auto vacuum workers? Cost limits are distributed first=
 equally to the autovacuum workers.
> >> > and then they share that. Therefore, parallel workers will be heavil=
y throttled. IIUC, this problem doesn't exist with manual vacuum.
> >> >  If we don't fix this, at least we should document this.
> >>
> >> Parallel a/v workers inherit cost based parameters (including the
> >> vacuum_cost_limit) from the leader worker. Do you mean that this can b=
e too
> >> low value for parallel operation? If so, user can manually increase th=
e
> >> vacuum_cost_limit reloption for those tables, where parallel a/v sleep=
s too
> >> much (due to cost delay).
> >>
> >> BTW, describing the cost limit propagation to the parallel a/v workers=
 is
> >> worth mentioning in the documentation. I'll add it in the next patch v=
ersion.
> >>
> >> > 3. Additionally, is there a point where, based on the cost limits, l=
aunching additional workers becomes counterproductive compared to running f=
ewer workers and preventing it?
> >>
> >> I don't think that we can possibly find a universal limit that will be
> >> appropriate for all possible configurations. By now we are using a pre=
tty
> >> simple formula for parallel degree calculation. Since user have severa=
l ways
> >> to affect this formula, I guess that there will be no problems with it=
 (except
> >> my concerns about opt-out style).
> >>
> >> > 4. Would it make sense to add a table level override to disable para=
llelism or set parallel worker count?
> >>
> >> We already have the "autovacuum_parallel_workers" reloption that is us=
ed as
> >> an additional limit for the number of parallel workers. In particular,=
 this
> >> reloption can be used to disable parallelism at all.
> >>
> >> >
> >> > I ran some perf tests to show the improvements with parallel vacuum =
and shared below.
> >>
> >> Thank you very much!
> >>
> >> > Observations:
> >> >
> >> > 1. Parallel autovacuum provides consistent speedup. With cost_limit=
=3D200 and
> >> >    7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_=
limit=3D60,
> >> >    the speedup is 1.25x (194s -> 154s).
> >> > 2. I see the benefit comes from parallelizing index vacuum. With 8 i=
ndexes totaling
> >> >    ~530 MB, parallel workers scan indexes concurrently instead of th=
e leader
> >> >    scanning them one by one. The leader's CPU user time drops from ~=
3s to
> >> >    ~0.8s as index work is offloaded
> >> >
> >>
> >> 1.41 speedup with 7 parallel workers may not seem like a great win, bu=
t it is
> >> a whole time of autovacuum operation (not only index bulkdel/cleanup) =
with
> >> pretty small indexes.
> >>
> >> May I ask you to run the same test with a higher table's size (several=
 dozen
> >> gigabytes)? I think the results will be more "expressive".
> >
> >
> > I ran it with a Billion rows in a table with 8 indexes. The improvement=
 with 7 workers is 1.8x.
> > Please note that there is a fixed overhead in other vacuum steps, for e=
xample heap scan.
> > In the environments where cost-based delay is used (the default), benef=
its will be modest
> > unless vacuum_cost_delay is set to sufficiently large value.
> >
> > Hardware:
> >   CPU:     Intel Xeon Platinum 8573C, 1 socket =C3=97 8 cores =C3=97 2 =
threads =3D 16 vCPUs
> >   RAM:     128 GB (131,900 MB)
> >   Swap:    None
> >
> > Workload Description
> >
> > Table Schema:
> >   CREATE TABLE avtest (
> >       id       bigint PRIMARY KEY,
> >       col1     int,           -- random()*1e9
> >       col2     int,           -- random()*1e9
> >       col3     int,           -- random()*1e9
> >       col4     int,           -- random()*1e9
> >       col5     int,           -- random()*1e9
> >       col6     text,          -- 'text_' || random()*1e6  (short text ~=
10 chars)
> >       col7     timestamp,     -- now() - random()*365 days
> >       padding  text           -- repeat('x', 50)
> >   ) WITH (fillfactor =3D 90);
> >
> > Indexes (8 total):
> >   avtest_pkey   =E2=80=94 btree on (id)        bigint
> >   idx_av_col1   =E2=80=94 btree on (col1)      int
> >   idx_av_col2   =E2=80=94 btree on (col2)      int
> >   idx_av_col3   =E2=80=94 btree on (col3)      int
> >   idx_av_col4   =E2=80=94 btree on (col4)      int
> >   idx_av_col5   =E2=80=94 btree on (col5)      int
> >   idx_av_col6   =E2=80=94 btree on (col6)      text
> >   idx_av_col7   =E2=80=94 btree on (col7)      timestamp
> >
> > Dead Tuple Generation:
> >   DELETE FROM avtest WHERE id % 5 IN (1, 2);
> >   This deletes exactly 40% of rows, uniformly distributed across all pa=
ges.
> >
> > Vacuum Trigger:
> >   Autovacuum is triggered naturally by lowering the threshold to 0 and =
setting
> >   scale_factor to a value that causes immediate launch after the DELETE=
.
> >
> > Worker Configurations Tested:
> >   0 workers  =E2=80=94 leader-only vacuum (baseline, no parallelism)
> >   2 workers  =E2=80=94 leader + 2 parallel workers (3 processes total)
> >   4 workers  =E2=80=94 leader + 4 parallel workers (5 processes total)
> >   7 workers  =E2=80=94 leader + 7 parallel workers (8 processes total, =
1 per index)
> >
> > Dataset:
> >   Rows:         1,000,000,000
> >   Heap size:    139 GB
> >   Total size:   279 GB (heap + 8 indexes)
> >   Dead tuples:  400,000,000 (40%)
> >
> > Index Sizes:
> >   avtest_pkey    21 GB   (bigint)
> >   idx_av_col7    21 GB   (timestamp)
> >   idx_av_col1    18 GB   (int)
> >   idx_av_col2    18 GB   (int)
> >   idx_av_col3    18 GB   (int)
> >   idx_av_col4    18 GB   (int)
> >   idx_av_col5    18 GB   (int)
> >   idx_av_col6     7 GB   (text =E2=80=94 shorter keys, smaller index)
> >   Total indexes: 139 GB
> >
> > Server Settings:
> >   shared_buffers                =3D 96GB
> >   maintenance_work_mem          =3D 1GB
> >   max_wal_size                  =3D 100GB
> >   checkpoint_timeout            =3D 1h
> >   autovacuum_vacuum_cost_delay  =3D 0ms (NO throttling)
> >   autovacuum_vacuum_cost_limit  =3D 1000
> >
> >
> > Summary:
> >
> > Workers  Avg(s)    Min(s)    Max(s)    Speedup   Time Saved
> > -------  ------    ------    ------    -------   ----------
> > 0        1645.93   1645.01   1646.84    1.00x          =E2=80=94
> > 2        1276.35   1275.64   1277.05    1.29x     369.58s (6.2 min)
> > 4        1052.62   1048.92   1056.32    1.56x     593.31s (9.9 min)
> > 7         892.23    886.59    897.86    1.84x     753.70s (12.6 min)
> >
>
> Thank you for sharing the performance test results!
>
> While the benchmark results look good to me, have you compared the
> performance differences between parallel vacuum in the VACUUM command
> (with the PARALLEL option) and parallel vacuum in autovacuum? Since
> parallel autovacuum introduces some logic to check for delay parameter
> updates, I thought it was worth verifying if this adds any overhead.
>
> BTW, in my view, the most challenging part of this patch is the
> propagation logic for vacuum delay parameters. This propagation is
> necessary because, unlike manual VACUUM, autovacuum workers can reload
> their configuration during operation. We must ensure that parallel
> workers stay synchronized with these updated parameters.
>
> The current patch implements this in vacuumparallel.c: the leader
> shares delay parameters in DSM and updates them (if any vacuum delay
> parameters are updated) after a config reload, while workers poll for
> updates at every vacuum_delay_point() call to refresh their local
> variables.
>
> Another possible approach would be an event-driven model where the
> leader notifies workers after updating shared parameters=E2=80=94for exam=
ple,
> by adding a shm_mq between the leader (as the sender) and each worker
> (as the receiver).
>
> I've compared these two ideas and opted for the former (polling).
> While a polling approach could theoretically be costly, the current
> implementation is self-contained within the parallel vacuum logic and
> does not touch the core parallel query infrastructure. The
> notification approach might look more elegant, but I'm concerned it
> adds unnecessary complexity just for the autovacuum case. Since the
> polling is essentially just checking an atomic variable, the overhead
> should be negligible.
>
> To verify this, I conducted benchmarks comparing the whole execution
> time and index vacuuming duration.
>
> Setup:
>
> - Disabled (auto) vacuum delays and buffer usage limits.
> - Parallel autovacuum with 1 worker on a table with 2 indexes (approx.
> 4 GB each).
> - 5 runs.
>
> Case 1: The latest patch (with polling)
>
> Average: 3.95s (Index: 1.54s)
> Median: 3.62s (Index: 1.37s)
>
> Case 2: The latest patch without polling
>
> Average: 3.98s (Index: 1.56s)
> Median: 3.70s (Index: 1.40s)
>
> Note that in order to simulate the code that doesn't have the polling,
> I reverted the following change:
>
> -   if (InterruptPending ||
> -       (!VacuumCostActive && !ConfigReloadPending))
> +   if (InterruptPending)
> +       return;
> +
> +   if (IsParallelWorker())
> +   {
> +       /*
> +        * Update cost-based vacuum delay parameters for a parallel autov=
acuum
> +        * worker if any changes are detected.
> +        */
> +       parallel_vacuum_update_shared_delay_params();
> +   }
> +
> +   if (!VacuumCostActive && !ConfigReloadPending)
>
> The parallel vacuum workers don't check the shared vacuum delay
> parameter at all, which is still fine as I disabled vacuum delays.
>
> Overall, the results show no noticeable overhead from the polling approac=
h.

I would say this polling approach is very cheap.  When there are no
updates, it only has to check a single 32-bit value from shared
memory.  And that value doesn't get updated frequently; it's good for
caching.  No wonder we see no measurable overhead.

Regarding the event-driven approach, given that the parallel worker
process is busy with other jobs (doing actual vacuuming), it would
anyway have to poll for new events from time to time.  Thus, I don't
think it's possible to organize polling for new events any cheaper
than the current approach of polling for updates in shmem.  If the
worker process was just waiting for GUC updates without any other
jobs, then, for instance, waiting on the latch would be cheaper than
polling in a loop, but that's not our case.

I don't see the current polling approach for GUC updates as
performance problematic.

------
Regards,
Alexander Korotkov
Supabase