Re: POC: Parallel processing of indexes in autovacuum

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Masahiko Sawada <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: SATYANARAYANA NARLAPURAM <[email protected]>
Cc: Daniil Davydov <[email protected]>
Cc: Bharath Rupireddy <[email protected]>
Cc: Sami Imseih <[email protected]>
Cc: Matheus Alcantara <[email protected]>
Cc: Maxim Orlov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: Thu, 2 Apr 2026 16:30:42 -0700
Message-ID: <CAD21AoACRVCT-ub+LTAtDaEZjxmwFcC7ON9_jfqpYegPdeXXOA@mail.gmail.com> (raw)
In-Reply-To: <CAPpHfduXzE7OeP0QgPEBhG8-4xg=wGgjoJi9c6-8kN9Fyji96g@mail.gmail.com>
References: <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
	<CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
	<CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
	<CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
	<CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
	<CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
	<CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
	<CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
	<CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
	<CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
	<CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
	<CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
	<CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
	<CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
	<CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
	<CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
	<CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
	<CAJDiXghmsbTmnm--9B5bbuZXa1OL7SZ0HYppX3tx9XsdwfJBhA@mail.gmail.com>
	<[email protected]>
	<CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com>
	<CAD21AoDtPpkkQ_h1yf4oTx1qn4SRdTeVY3qs+9J07fYqa_4Gww@mail.gmail.com>
	<CAJDiXgi7KB7wSQ=Ux=ngdaCvJnJ5x-ehvTyiuZez+5uKHtV6iQ@mail.gmail.com>
	<CAD21AoCcHKKXsr9Oh736ejckqqS1i430xGEyJ=JP5OL0ExyP1A@mail.gmail.com>
	<CAJDiXghaFT_1sSv3q8mjyZ_RLZDgiogg0mWRvLxSWvkUi2CcLg@mail.gmail.com>
	<CAA5RZ0u63W41OmcEO+HLs4CSo-Sd3J+Q-4=04iud8V=xX4iUrA@mail.gmail.com>
	<CAJDiXgin1TXniVGJKzOTA=F9K342uVfm6O0EmubTVB=F+XSrbA@mail.gmail.com>
	<CAD21AoDadzAwibxf-+urjx=XL+eVu8=Ut-Lh2GxXUt32LbPG3Q@mail.gmail.com>
	<CAD21AoD6HhraqhOgkQJOrr0ixZkAZuqJRpzGv-B+_-ad6d5aPw@mail.gmail.com>
	<CAJDiXgiGSpqMQSOx-cVO_LtcB5GWHBy9ph7oOR4ebbX8A==kgw@mail.gmail.com>
	<CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com>
	<CAPpHfduBJfMcojvmYHUo8b_C=0cxRy1N+tNiNGoA3RAZq2ApaA@mail.gmail.com>
	<CAD21AoC82NeHKXc965pPUZO2eyo1U7P6cmfRJbrcPDcnd7_6hw@mail.gmail.com>
	<CAJDiXghP2kXnEz+cj3rAWNM3NdKSB_4WtnngFXpVz2omPhGr5A@mail.gmail.com>
	<CAD21AoA0bnRZC_OqKMnH-Ln+OZ9z9k56j2c_MXj8pw69O-wkBw@mail.gmail.com>
	<CAA5RZ0sSXDza7_nUUbhHL_Sws+M+HR1daKJPXHpdLuNCkwUgUg@mail.gmail.com>
	<CAJDiXggrBsbzOisf+Nu8pZkYGrpUZaFbosL1Wbm3kKxzTm4xgw@mail.gmail.com>
	<CAA5RZ0tbiPcgQEjnhdnjz6qSjfRsGrr8jGCaMcrMaoPpax3wig@mail.gmail.com>
	<CAJDiXgjt5ZmK2uvS0E8Ztt5ePYmq8Ze_dG05Zo2NUsKLHCEuYA@mail.gmail.com>
	<CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com>
	<CAJDiXghwtUbiFnAh3nSaxTk8KFupQuMbp+g4z3wOLoQfMuqgDg@mail.gmail.com>
	<CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>
	<CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com>
	<CAJDiXggL=J0nV7PfBsMW9+UOU3KUp1jNBM9Gov1JvAX7aG_U1g@mail.gmail.com>
	<CAD21AoDz-1Zf9DOJJrdcB2=eNA4UdywthkowNp_dHmOGC-yV_g@mail.gmail.com>
	<CAJDiXgjzphJ313=aDwbvryHpmTi6AqE+-5crysTtzKv01-vkzA@mail.gmail.com>
	<CAD21AoD7_4gsQ2a82zO3SaRwjdw_3tyiYDHNFPUKQ5DAA5HOtA@mail.gmail.com>
	<CAJDiXggY1QzNde6_HhpzneLc9dYqmWZ+PY39cuBXYdcCTuoJBA@mail.gmail.com>
	<CAD21AoCFPiS2jcMA1JaV1kT8xrGz5BpN7iBP_gCgRuaANEbciA@mail.gmail.com>
	<CAJDiXgh6jmNGR3uOB_6YeGhNkR2=HdTdEYjmHXdumNzyY4MckQ@mail.gmail.com>
	<CAD21AoDs3SOXeAEoCRizfEKybpRkE7t7poX0+iZ6MM1MFWMsfA@mail.gmail.com>
	<CAJDiXgjTkuqSPerC_nasxDz6d2Komf1ipYKV6SupDRnc9yhO9w@mail.gmail.com>
	<CAD21AoAXMjX03h5K84u0heBLU+fqGgWBGBDwnBDGSs=DhyF9pQ@mail.gmail.com>
	<CAJDiXghjZEAYboGhujgGvY9=RiFD01ERHVVF+NQMuuAKVZDmDQ@mail.gmail.com>
	<CAD21AoAD+N5SxBr0qL7TeWnvq4iYmFT=DyWdNLQPB-XntYkwEg@mail.gmail.com>
	<CAJDiXgjgn87sH1-MmONPKkeYJG83C0ChrYkYn9UcRonLhOOfOw@mail.gmail.com>
	<CAD21AoCoJYauWO78M-CGdHpYfcqEZVV5a1Z-7wWB=-G-x8EVFg@mail.gmail.com>
	<CAJDiXghaazbrQMZZS08d9Ffh2y4w05TgH9dpBhqChv1qNTp+xA@mail.gmail.com>
	<CAD21AoDbaNtLrFRxG9OG5WrBd7DCs4q+CfJd8AJTBEqRri4WeQ@mail.gmail.com>
	<CAJDiXgjjd1jL86B--AyRo2tDM1Wiu+7Pduwh5d0u_UM8GRugvw@mail.gmail.com>
	<CAD21AoBo_wS7y0X7_7ajEFkptzo9ZrF8RFNRnu2Xe8XL74o0SQ@mail.gmail.com>
	<CAJDiXggH1bW=4n+55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw@mail.gmail.com>
	<CAD21AoDxhN8Z6Lx1ZicBXKkbMsRQqEXiq4ALs4uaD648iSvXoA@mail.gmail.com>
	<CAJDiXgh3Dg2f5k3xRJnzoY39jQENUhh125ArYapXkSu5D7JJuw@mail.gmail.com>
	<CAD21AoBYc7L7W4dRdxeoJzOH5OgpiCAtKz-54iX4Ufn8PnQoww@mail.gmail.com>
	<CAJDiXgi8X-DMb92v5WHLCNxDHxH9gO8WQxOMtdpmU7X=WXCiuQ@mail.gmail.com>
	<CAD21AoDKxs0UrTwa3rkP+kE9AzccabpK7G-Tk=HYneaFTZBtiA@mail.gmail.com>
	<CALj2ACUJ0TtYWtFuXXVf0aLES8tfZePXnB8WQ=0KCrNaABzQVg@mail.gmail.com>
	<CAJDiXgj=-R1z7H7+npm-o+q6YBkr5_6Qe=1wcy47ovAqej4TkA@mail.gmail.com>
	<CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com>
	<CAJDiXgi73x7h0=UoXriFjskRB6htZ-uqXKqvWN3RefuxbP93gA@mail.gmail.com>
	<CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com>
	<CAD21AoAwcevLebQO6+MuWoOi9XNrNMg7gubFbLao2QkrRbOfMQ@mail.gmail.com>
	<CAPpHfduXzE7OeP0QgPEBhG8-4xg=wGgjoJi9c6-8kN9Fyji96g@mail.gmail.com>

On Thu, Apr 2, 2026 at 4:02 AM Alexander Korotkov <[email protected]> wrote:
>
> Hi!
>
> On Wed, Apr 1, 2026 at 9:55 PM Masahiko Sawada <[email protected]> wrote:
> >
> > On Mon, Mar 30, 2026 at 5:14 PM SATYANARAYANA NARLAPURAM
> > <[email protected]> wrote:
> > >
> > > Hi
> > >
> > > On Mon, Mar 30, 2026 at 1:44 AM Daniil Davydov <[email protected]> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On Mon, Mar 30, 2026 at 7:17 AM SATYANARAYANA NARLAPURAM
> > >> <[email protected]> wrote:
> > >> >
> > >> > Thank you for working on this, very useful feature. Sharing a few thoughts:
> > >> >
> > >> > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers?
> > >>
> > >> Actually, autovacuum_max_parallel_workers is already limited by
> > >> max_parallel_workers. It is not clear for me why we allow setting this GUC
> > >> higher than max_parallel_workers, but if this happens, I think it is a user's
> > >> misconfiguration.
> > >>
> > >> > 2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limits are distributed first equally to the autovacuum workers.
> > >> > and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist with manual vacuum.
> > >> >  If we don't fix this, at least we should document this.
> > >>
> > >> Parallel a/v workers inherit cost based parameters (including the
> > >> vacuum_cost_limit) from the leader worker. Do you mean that this can be too
> > >> low value for parallel operation? If so, user can manually increase the
> > >> vacuum_cost_limit reloption for those tables, where parallel a/v sleeps too
> > >> much (due to cost delay).
> > >>
> > >> BTW, describing the cost limit propagation to the parallel a/v workers is
> > >> worth mentioning in the documentation. I'll add it in the next patch version.
> > >>
> > >> > 3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductive compared to running fewer workers and preventing it?
> > >>
> > >> I don't think that we can possibly find a universal limit that will be
> > >> appropriate for all possible configurations. By now we are using a pretty
> > >> simple formula for parallel degree calculation. Since user have several ways
> > >> to affect this formula, I guess that there will be no problems with it (except
> > >> my concerns about opt-out style).
> > >>
> > >> > 4. Would it make sense to add a table level override to disable parallelism or set parallel worker count?
> > >>
> > >> We already have the "autovacuum_parallel_workers" reloption that is used as
> > >> an additional limit for the number of parallel workers. In particular, this
> > >> reloption can be used to disable parallelism at all.
> > >>
> > >> >
> > >> > I ran some perf tests to show the improvements with parallel vacuum and shared below.
> > >>
> > >> Thank you very much!
> > >>
> > >> > Observations:
> > >> >
> > >> > 1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and
> > >> >    7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=60,
> > >> >    the speedup is 1.25x (194s -> 154s).
> > >> > 2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling
> > >> >    ~530 MB, parallel workers scan indexes concurrently instead of the leader
> > >> >    scanning them one by one. The leader's CPU user time drops from ~3s to
> > >> >    ~0.8s as index work is offloaded
> > >> >
> > >>
> > >> 1.41 speedup with 7 parallel workers may not seem like a great win, but it is
> > >> a whole time of autovacuum operation (not only index bulkdel/cleanup) with
> > >> pretty small indexes.
> > >>
> > >> May I ask you to run the same test with a higher table's size (several dozen
> > >> gigabytes)? I think the results will be more "expressive".
> > >
> > >
> > > I ran it with a Billion rows in a table with 8 indexes. The improvement with 7 workers is 1.8x.
> > > Please note that there is a fixed overhead in other vacuum steps, for example heap scan.
> > > In the environments where cost-based delay is used (the default), benefits will be modest
> > > unless vacuum_cost_delay is set to sufficiently large value.
> > >
> > > Hardware:
> > >   CPU:     Intel Xeon Platinum 8573C, 1 socket × 8 cores × 2 threads = 16 vCPUs
> > >   RAM:     128 GB (131,900 MB)
> > >   Swap:    None
> > >
> > > Workload Description
> > >
> > > Table Schema:
> > >   CREATE TABLE avtest (
> > >       id       bigint PRIMARY KEY,
> > >       col1     int,           -- random()*1e9
> > >       col2     int,           -- random()*1e9
> > >       col3     int,           -- random()*1e9
> > >       col4     int,           -- random()*1e9
> > >       col5     int,           -- random()*1e9
> > >       col6     text,          -- 'text_' || random()*1e6  (short text ~10 chars)
> > >       col7     timestamp,     -- now() - random()*365 days
> > >       padding  text           -- repeat('x', 50)
> > >   ) WITH (fillfactor = 90);
> > >
> > > Indexes (8 total):
> > >   avtest_pkey   — btree on (id)        bigint
> > >   idx_av_col1   — btree on (col1)      int
> > >   idx_av_col2   — btree on (col2)      int
> > >   idx_av_col3   — btree on (col3)      int
> > >   idx_av_col4   — btree on (col4)      int
> > >   idx_av_col5   — btree on (col5)      int
> > >   idx_av_col6   — btree on (col6)      text
> > >   idx_av_col7   — btree on (col7)      timestamp
> > >
> > > Dead Tuple Generation:
> > >   DELETE FROM avtest WHERE id % 5 IN (1, 2);
> > >   This deletes exactly 40% of rows, uniformly distributed across all pages.
> > >
> > > Vacuum Trigger:
> > >   Autovacuum is triggered naturally by lowering the threshold to 0 and setting
> > >   scale_factor to a value that causes immediate launch after the DELETE.
> > >
> > > Worker Configurations Tested:
> > >   0 workers  — leader-only vacuum (baseline, no parallelism)
> > >   2 workers  — leader + 2 parallel workers (3 processes total)
> > >   4 workers  — leader + 4 parallel workers (5 processes total)
> > >   7 workers  — leader + 7 parallel workers (8 processes total, 1 per index)
> > >
> > > Dataset:
> > >   Rows:         1,000,000,000
> > >   Heap size:    139 GB
> > >   Total size:   279 GB (heap + 8 indexes)
> > >   Dead tuples:  400,000,000 (40%)
> > >
> > > Index Sizes:
> > >   avtest_pkey    21 GB   (bigint)
> > >   idx_av_col7    21 GB   (timestamp)
> > >   idx_av_col1    18 GB   (int)
> > >   idx_av_col2    18 GB   (int)
> > >   idx_av_col3    18 GB   (int)
> > >   idx_av_col4    18 GB   (int)
> > >   idx_av_col5    18 GB   (int)
> > >   idx_av_col6     7 GB   (text — shorter keys, smaller index)
> > >   Total indexes: 139 GB
> > >
> > > Server Settings:
> > >   shared_buffers                = 96GB
> > >   maintenance_work_mem          = 1GB
> > >   max_wal_size                  = 100GB
> > >   checkpoint_timeout            = 1h
> > >   autovacuum_vacuum_cost_delay  = 0ms (NO throttling)
> > >   autovacuum_vacuum_cost_limit  = 1000
> > >
> > >
> > > Summary:
> > >
> > > Workers  Avg(s)    Min(s)    Max(s)    Speedup   Time Saved
> > > -------  ------    ------    ------    -------   ----------
> > > 0        1645.93   1645.01   1646.84    1.00x          —
> > > 2        1276.35   1275.64   1277.05    1.29x     369.58s (6.2 min)
> > > 4        1052.62   1048.92   1056.32    1.56x     593.31s (9.9 min)
> > > 7         892.23    886.59    897.86    1.84x     753.70s (12.6 min)
> > >
> >
> > Thank you for sharing the performance test results!
> >
> > While the benchmark results look good to me, have you compared the
> > performance differences between parallel vacuum in the VACUUM command
> > (with the PARALLEL option) and parallel vacuum in autovacuum? Since
> > parallel autovacuum introduces some logic to check for delay parameter
> > updates, I thought it was worth verifying if this adds any overhead.
> >
> > BTW, in my view, the most challenging part of this patch is the
> > propagation logic for vacuum delay parameters. This propagation is
> > necessary because, unlike manual VACUUM, autovacuum workers can reload
> > their configuration during operation. We must ensure that parallel
> > workers stay synchronized with these updated parameters.
> >
> > The current patch implements this in vacuumparallel.c: the leader
> > shares delay parameters in DSM and updates them (if any vacuum delay
> > parameters are updated) after a config reload, while workers poll for
> > updates at every vacuum_delay_point() call to refresh their local
> > variables.
> >
> > Another possible approach would be an event-driven model where the
> > leader notifies workers after updating shared parameters—for example,
> > by adding a shm_mq between the leader (as the sender) and each worker
> > (as the receiver).
> >
> > I've compared these two ideas and opted for the former (polling).
> > While a polling approach could theoretically be costly, the current
> > implementation is self-contained within the parallel vacuum logic and
> > does not touch the core parallel query infrastructure. The
> > notification approach might look more elegant, but I'm concerned it
> > adds unnecessary complexity just for the autovacuum case. Since the
> > polling is essentially just checking an atomic variable, the overhead
> > should be negligible.
> >
> > To verify this, I conducted benchmarks comparing the whole execution
> > time and index vacuuming duration.
> >
> > Setup:
> >
> > - Disabled (auto) vacuum delays and buffer usage limits.
> > - Parallel autovacuum with 1 worker on a table with 2 indexes (approx.
> > 4 GB each).
> > - 5 runs.
> >
> > Case 1: The latest patch (with polling)
> >
> > Average: 3.95s (Index: 1.54s)
> > Median: 3.62s (Index: 1.37s)
> >
> > Case 2: The latest patch without polling
> >
> > Average: 3.98s (Index: 1.56s)
> > Median: 3.70s (Index: 1.40s)
> >
> > Note that in order to simulate the code that doesn't have the polling,
> > I reverted the following change:
> >
> > -   if (InterruptPending ||
> > -       (!VacuumCostActive && !ConfigReloadPending))
> > +   if (InterruptPending)
> > +       return;
> > +
> > +   if (IsParallelWorker())
> > +   {
> > +       /*
> > +        * Update cost-based vacuum delay parameters for a parallel autovacuum
> > +        * worker if any changes are detected.
> > +        */
> > +       parallel_vacuum_update_shared_delay_params();
> > +   }
> > +
> > +   if (!VacuumCostActive && !ConfigReloadPending)
> >
> > The parallel vacuum workers don't check the shared vacuum delay
> > parameter at all, which is still fine as I disabled vacuum delays.
> >
> > Overall, the results show no noticeable overhead from the polling approach.
>
> I would say this polling approach is very cheap.  When there are no
> updates, it only has to check a single 32-bit value from shared
> memory.  And that value doesn't get updated frequently; it's good for
> caching.  No wonder we see no measurable overhead.

Thank you for the comments!

>
> Regarding the event-driven approach, given that the parallel worker
> process is busy with other jobs (doing actual vacuuming), it would
> anyway have to poll for new events from time to time.  Thus, I don't
> think it's possible to organize polling for new events any cheaper
> than the current approach of polling for updates in shmem.

What do you think about the idea of using proc signals like the patch
I've sent recently[1]? With that approach, workers have to check the
local variable. It seems slightly cheaper and can use the existing
logic.

[1] https://www.postgresql.org/message-id/CAD21AoBm0cxQjtWuY0f7%2BaT4UiRV%2B%2BaFKkzjj6vmERTj_UFnxA%40ma...

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

view thread (112+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: POC: Parallel processing of indexes in autovacuum
  In-Reply-To: <CAD21AoACRVCT-ub+LTAtDaEZjxmwFcC7ON9_jfqpYegPdeXXOA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox