Re: POC: Parallel processing of indexes in autovacuum

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Masahiko Sawada <[email protected]>
To: Daniil Davydov <[email protected]>
Cc: Sami Imseih <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: Matheus Alcantara <[email protected]>
Cc: Maxim Orlov <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: POC: Parallel processing of indexes in autovacuum
Date: Wed, 14 Jan 2026 18:13:16 -0800
Message-ID: <CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com> (raw)
In-Reply-To: <CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>
References: <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
	<CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
	<CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com>
	<CAD21AoBcoA-i-pJ_=y+jg14R8_QaJA1iwktCnu5i-C=yXDFPdA@mail.gmail.com>
	<CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com>
	<CAD21AoBTZdVR93JBo620B=MX-K8cdm3VRbjrBr_Vcpngk3AjVw@mail.gmail.com>
	<CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com>
	<CAD21AoBgvUeWS8ZsXBahA1XdYayK6DJ6dx49d6Xpii-iH+Hrwg@mail.gmail.com>
	<CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com>
	<CAJDiXggueLSGMNRmLshbmFRfbo4jzks0W8bLDfUSRZ-61fPVEQ@mail.gmail.com>
	<CAFY6G8cJ=DRTX75pOGerH6sk39dRt+7MSH+y_qppDdhPs=qdQA@mail.gmail.com>
	<CAJDiXgg1t6wk9NjyMUTm1iKqM9GtdQ_wrEchBtz3xjWBZM8W8A@mail.gmail.com>
	<CAD21AoAC0=Xi38RQcAO4A+vdmoXToZMoHfbS=KLT49fAOTH_gA@mail.gmail.com>
	<CAJDiXgiD+AZKhJSn-FSRVQxtDLmJd95wDu4wtKniQF5==1JcjQ@mail.gmail.com>
	<CAD21AoAM8KsqNhrZYJuf7odvxcTC0TumXazJc-r_wC5KnDFDPg@mail.gmail.com>
	<CAJDiXghbcOC9OOj3ampxuyqXH0geggnosnrYUHGygkpss-RtxA@mail.gmail.com>
	<CAD21AoAPnq0vrcGgeN++r1GoL8Kza7jaGL=TNzuBn6+MkR=rUQ@mail.gmail.com>
	<CAJDiXghmsbTmnm--9B5bbuZXa1OL7SZ0HYppX3tx9XsdwfJBhA@mail.gmail.com>
	<[email protected]>
	<CAJDiXgiYiX+azuR76DcVx8fZn57m_4v6cB14-GW34mWa=qudFQ@mail.gmail.com>
	<CAD21AoDtPpkkQ_h1yf4oTx1qn4SRdTeVY3qs+9J07fYqa_4Gww@mail.gmail.com>
	<CAJDiXgi7KB7wSQ=Ux=ngdaCvJnJ5x-ehvTyiuZez+5uKHtV6iQ@mail.gmail.com>
	<CAD21AoCcHKKXsr9Oh736ejckqqS1i430xGEyJ=JP5OL0ExyP1A@mail.gmail.com>
	<CAJDiXghaFT_1sSv3q8mjyZ_RLZDgiogg0mWRvLxSWvkUi2CcLg@mail.gmail.com>
	<CAA5RZ0u63W41OmcEO+HLs4CSo-Sd3J+Q-4=04iud8V=xX4iUrA@mail.gmail.com>
	<CAJDiXgin1TXniVGJKzOTA=F9K342uVfm6O0EmubTVB=F+XSrbA@mail.gmail.com>
	<CAD21AoDadzAwibxf-+urjx=XL+eVu8=Ut-Lh2GxXUt32LbPG3Q@mail.gmail.com>
	<CAD21AoD6HhraqhOgkQJOrr0ixZkAZuqJRpzGv-B+_-ad6d5aPw@mail.gmail.com>
	<CAJDiXgiGSpqMQSOx-cVO_LtcB5GWHBy9ph7oOR4ebbX8A==kgw@mail.gmail.com>
	<CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPKmrSDjJ3wZ8TN28vaCQ@mail.gmail.com>
	<CAPpHfduBJfMcojvmYHUo8b_C=0cxRy1N+tNiNGoA3RAZq2ApaA@mail.gmail.com>
	<CAD21AoC82NeHKXc965pPUZO2eyo1U7P6cmfRJbrcPDcnd7_6hw@mail.gmail.com>
	<CAJDiXghP2kXnEz+cj3rAWNM3NdKSB_4WtnngFXpVz2omPhGr5A@mail.gmail.com>
	<CAD21AoA0bnRZC_OqKMnH-Ln+OZ9z9k56j2c_MXj8pw69O-wkBw@mail.gmail.com>
	<CAA5RZ0sSXDza7_nUUbhHL_Sws+M+HR1daKJPXHpdLuNCkwUgUg@mail.gmail.com>
	<CAJDiXggrBsbzOisf+Nu8pZkYGrpUZaFbosL1Wbm3kKxzTm4xgw@mail.gmail.com>
	<CAA5RZ0tbiPcgQEjnhdnjz6qSjfRsGrr8jGCaMcrMaoPpax3wig@mail.gmail.com>
	<CAJDiXgjt5ZmK2uvS0E8Ztt5ePYmq8Ze_dG05Zo2NUsKLHCEuYA@mail.gmail.com>
	<CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com>
	<CAJDiXghwtUbiFnAh3nSaxTk8KFupQuMbp+g4z3wOLoQfMuqgDg@mail.gmail.com>
	<CAJDiXgjoNd4BF19HNY_FAcDUqiqsfw8cGhNOJwBxahB8P38E3Q@mail.gmail.com>

On Wed, Jan 7, 2026 at 1:51 AM Daniil Davydov <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jan 6, 2026 at 3:44 AM Daniil Davydov <[email protected]> wrote:
> >
> > On Tue, Jan 6, 2026 at 1:51 AM Masahiko Sawada <[email protected]> wrote:
> > >
> > > Are you still working on it? Or shall I draft this part on top of the
> > > 0001 patch?
> >
> > I thought about some "beautiful" approach, but for now I have
> > only one idea - force parallel a/v workers to get values for these
> > parameters from shmem (which obviously can be modified by the
> > leader a/v process). I'll post this patch in the near future.
> >
>
> I am posting a draft version of the patch (see 0005) that allows parallel
> leader to propagate changes of cost-based parameters to its parallel
> workers. It is a very rough fix, but it reflects my idea that is to have some
> shared state from which parallel workers can get values for the parameters
> (and which only leader worker can modify, obviously).
>
> I have also added a test that shows that this idea is working - the test
> ensures that parallel workers can change its parameters if they have been
> changed for the leader worker.
>
> Note that so far the work is in progress - this logic works only for
> vacuum_cost_delay and vacuum_cost_limits parameters. I think that we
> should agree on an idea first, and only then apply logic for all appropriate
> parameters.
>
> What do you think?

Thank you for updating the patches! Here are review comments.

* 0001 patch

+static void
+autovacuum_worker_before_shmem_exit(int code, Datum arg)
+{
+   if (code != 0)
+       AutoVacuumReleaseAllParallelWorkers();
+
+   Assert(av_nworkers_reserved == 0);
+}

While adding the assertion here makes sense, the assertion won't work
in non-assertion builds. I guess it's safer to call
AutoVacuumReleaseAllParallelWorkers() regardless of the code to ensure
that no autovacuum workers exit while holding parallel workers.

---
+   before_shmem_exit(autovacuum_worker_before_shmem_exit, 0);

I think it would be better to set this callback later like before the
main loop of processing the tables as it makes no sense even if we set
it very early.

---
+   /*
+    * Cap the number of free workers by new parameter's value, if needed.
+    */
+   AutoVacuumShmem->av_freeParallelWorkers =
+       Min(AutoVacuumShmem->av_freeParallelWorkers,
+           autovacuum_max_parallel_workers);
+
+   if (autovacuum_max_parallel_workers > prev_max_parallel_workers)
+   {
+       /*
+        * If user wants to increase number of parallel autovacuum workers, we
+        * must increase number of free workers.
+        */
+       AutoVacuumShmem->av_freeParallelWorkers +=
+           (autovacuum_max_parallel_workers - prev_max_parallel_workers);
+   }

Suppose the previous autovacuum_max_parallel_workers is 5 and there
are 2 workers are reserved (i.e., there are 3 free parallel workers),
if the autovacuum_max_parallel_workers changes to 2, the new
AutoVacuumShmem->av_freeParallelWorkers would be 2 based on the above
codes, but I believe that the new number of free workers should be 0
as there are already 2 workers are running. What do you think? I guess
we can calculate the new number of free workers by:

Max((autovacuum_max_parallel_workers - prev_max_parallel_workers) +
AutoVacuumShmem->av_freeParallelWorkers), 0)

---
I've attached a patch proposing some minor changes.


* 0002 patch

+   /*
+    * Number of planned and actually launched parallel workers for all index
+    * scans, or NULL
+    */
+   PVWorkersUsage *workers_usage;

I think that LVRelState can have PVWorkersUsage instead of a pointer to it.

---
+       /*
+        * Allocate space for workers usage statistics. Thus, we explicitly
+        * make clear that such statistics must be accumulated. For now, this
+        * is used only by autovacuum leader worker, because it must log it in
+        * the end of table processing.
+        */
+       vacrel->workers_usage = AmAutoVacuumWorkerProcess() ?
+           (PVWorkersUsage *) palloc0(sizeof(PVWorkersUsage)) :
+           NULL;

I think we can report the worker statistics even in VACUUM VERBOSE
logs. Currently VACUUM VERBOSE reports the worker usage just during
index vacuuming but it would make sense to report the overall
statistics in vacuum logs. It would help make VACUUM VERBOSE logs and
autovacuum logs consistent.

But we don't need to report the worker usage if we didn't use the
parallel vacuum (i.e., if npanned == 0).

---
+       /* Remember these values, if we asked to. */
+       if (wusage != NULL)
+       {
+           wusage->nlaunched += pvs->pcxt->nworkers_launched;
+           wusage->nplanned += nworkers;
+       }

This code runs after the attempt to reserve parallel workers.
Consequently, if we fail to reserve any workers due to
autovacuum_max_parallel_workers, we report the status as if parallel
vacuum wasn't planned at all. I think knowing the number of workers
that were planned but not reserved would provide valuable insight for
users tuning autovacuum_max_parallel_workers.

---
+           if (vacrel->workers_usage)
+               appendStringInfo(&buf,
+                                _("parallel index vacuum/cleanup :
workers planned = %d, workers launched = %d\n"),
+                                vacrel->workers_usage->nplanned,
+                                vacrel->workers_usage->nlaunched);

Since these numbers are the total number of workers planned and
launched, how about changing it to something "parallel index
vacuum/cleanup: %d workers were planned and %d workers were launched
in total"?


* 0003 patch

+typedef enum AVLeaderFaulureType
+{
+   FAIL_NONE,
+   FAIL_ERROR,
+   FAIL_FATAL,
+}          AVLeaderFaulureType;

I'm concerned that it is somewhat overwrapped with what injection
points does as we can set 'error' to injection_points_attach(). For
the FATAL error, we can terminate the autovacuum worker by using
pg_terminate_backend() that keeps waiting due to
injection_point_attach() with action='wait'.

---
+   /*
+    * Injection point to help exercising number of available parallel
+    * autovacuum workers.
+    */
+   INJECTION_POINT("autovacuum-set-free-parallel-workers-num",
+                   &AutoVacuumShmem->av_freeParallelWorkers);

This injection point is added to two places. IIUC the purpose of this
function is to update the free_parallel_workers of InjPointState. And
that value is taken by get_parallel_autovacuum_free_workers() SQL
function during the TAP test. I guess it's better to have
get_parallel_autovacuum_free_workers() function to direcly check
av_freeParallelWorkers with a proper locking.

---
It would be great if we could test the av_freeParallelWorkers
adjustment when max_parallel_maintenance_workers changes.


* 0005 patch

+typedef struct PVSharedCostParams
+{
+   slock_t     spinlock; /* protects all fields below */
+
+   /* Copies of corresponding parameters from autovacuum leader process */
+   double  cost_delay;
+   int     cost_limit;
+}      PVSharedCostParams;

Since Parallel workers don't reload the config file I think other
vacuum delay related parameters such as VacuumCostPage{Miss|Hit|Dirty}
also needs to be shared by the leader.

---
+   if (!AmAutoVacuumWorkerProcess())
+   {
+       /*
+        * If we are autovacuum parallel worker, check whether cost-based
+        * parameters had changed in leader worker.
+        * If so, vacuum_cost_delay and vacuum_cost_limit will be set to the
+        * values which leader worker is operating on.
+        *
+        * Do it before checking VacuumCostActive, because its value might be
+        * changed after leader's parameters consumption.
+        */
+       parallel_vacuum_fix_cost_based_params();
+   }

We need to add checks to prevent the normal backend running the VACUUM
command from calling parallel_vacuum_fix_cost_based_params().

IIUC autovacuum parallel workers would call
parallel_vacuum_fix_cost_based_params() and update their
vacuum_cost_{delay|limit} every vacuum_delay_point().

---
+/*
+ * Function to be called from parallel autovacuum worker in order to sync
+ * some cost-based delay parameter with the leader worker.
+ */
+bool
+parallel_vacuum_fix_cost_based_params(void)
+{

The 'fix' doesn't sound right to me as it's not broken actually. How
about something like parallel_vacuum_update_shared_delay_params?

+   Assert(IsParallelWorker() && !AmAutoVacuumWorkerProcess());
+
+   SpinLockAcquire(&pv_shared_cost_params->spinlock);
+
+   vacuum_cost_delay = pv_shared_cost_params->cost_delay;
+   vacuum_cost_limit = pv_shared_cost_params->cost_limit;
+
+   SpinLockRelease(&pv_shared_cost_params->spinlock);

IIUC autovacuum parallel workers seems to update their
vacuum_cost_{delay|limit} every vacuum_delay_point(), which seems not
good. Can we somehow avoid unnecessary updates?

---
+
+   if (vacuum_cost_delay > 0 && !VacuumFailsafeActive)
+       VacuumCostActive = true;
+

Should we consider the case of disabling VacuumCostActive as well?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] 0001_masahiko.patch (4.4K, 2-0001_masahiko.patch)
  download | inline diff:
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
index 6a3a00585f9..cb42d4e572f 100644
--- a/src/backend/commands/vacuumparallel.c
+++ b/src/backend/commands/vacuumparallel.c
@@ -656,7 +656,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
 	nworkers = Min(nworkers, pvs->pcxt->nworkers);
 
 	/*
-	 * Reserve workers in autovacuum global state. Note, that we may be given
+	 * Reserve workers in autovacuum global state. Note that we may be given
 	 * fewer workers than we requested.
 	 */
 	if (AmAutoVacuumWorkerProcess() && nworkers > 0)
@@ -706,15 +706,12 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
 
 		LaunchParallelWorkers(pvs->pcxt);
 
-		if (AmAutoVacuumWorkerProcess() &&
-			pvs->pcxt->nworkers_launched < nworkers)
-		{
-			/*
-			 * Tell autovacuum that we could not launch all the previously
-			 * reserved workers.
-			 */
+		/*
+		 * Tell autovacuum that we could not launch all the previously
+		 * reserved workers.
+		 */
+		if (AmAutoVacuumWorkerProcess() && pvs->pcxt->nworkers_launched < nworkers)
 			AutoVacuumReleaseParallelWorkers(nworkers - pvs->pcxt->nworkers_launched);
-		}
 
 		if (pvs->pcxt->nworkers_launched > 0)
 		{
@@ -765,7 +762,7 @@ parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scan
 		for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
 			InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
 
-		/* Also release all previously reserved parallel autovacuum workers */
+		/* Release all the reserved parallel workers for autovacuum */
 		if (AmAutoVacuumWorkerProcess() && pvs->pcxt->nworkers_launched > 0)
 			AutoVacuumReleaseParallelWorkers(pvs->pcxt->nworkers_launched);
 	}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index bc11970bfee..6ccc88c4e1e 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -152,8 +152,9 @@ static double av_storage_param_cost_delay = -1;
 static int	av_storage_param_cost_limit = -1;
 
 /*
- * Variable to keep number of currently reserved parallel autovacuum workers.
- * It is only relevant for parallel autovacuum leader process.
+ * Tracks the number of parallel workers currently reserved by the
+ * autovacuum worker. This is non-zero only for the parallel autovacuum
+ * leader process.
  */
 static int	av_nworkers_reserved = 0;
 
@@ -3407,33 +3408,24 @@ AutoVacuumRequestWork(AutoVacuumWorkItemType type, Oid relationId,
 }
 
 /*
- * In order to meet the 'autovacuum_max_parallel_workers' limit, leader
- * autovacuum process must call this function during computing the parallel
- * degree.
+ * Reserves parallel workers for autovacuum.
  *
- * 'nworkers' is the desired number of parallel workers to reserve. Function
- * sets 'nworkers' to the number of parallel workers that actually can be
- * launched and reserves these workers (if any) in global autovacuum state.
- *
- * NOTE: We will try to provide as many workers as requested, even if caller
- * will occupy all available workers.
+ * nworkers is an in/out parameter; the requested number of parallel workers
+ * to reserve by the caller, and set to the actual number of reserved workers.
  */
 void
 AutoVacuumReserveParallelWorkers(int *nworkers)
 {
-	/* Only leader worker can call this function. */
+	/* Only leader autovacuum worker can call this function. */
 	Assert(AmAutoVacuumWorkerProcess() && !IsParallelWorker());
 
-	/*
-	 * We can only reserve workers at the beginning of parallel index
-	 * processing, so we must not have any reserved workers right now.
-	 */
+	/* The worker must not have any reserved workers yet */
 	Assert(av_nworkers_reserved == 0);
 
 	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
 
 	/* Provide as many workers as we can. */
-	*nworkers= Min(AutoVacuumShmem->av_freeParallelWorkers, *nworkers);
+	*nworkers = Min(AutoVacuumShmem->av_freeParallelWorkers, *nworkers);
 	AutoVacuumShmem->av_freeParallelWorkers -= *nworkers;
 
 	/* Remember how many workers we have reserved. */
@@ -3632,8 +3624,8 @@ check_av_worker_gucs(void)
 }
 
 /*
- * Make sure that number of free parallel workers corresponds to the
- * autovacuum_max_parallel_workers parameter (after it was changed).
+ * Adjusts the number of free parallel workers corresponds to the new
+ * autovacuum_max_parallel_workers value.
  */
 static void
 adjust_free_parallel_workers(int prev_max_parallel_workers)

view thread (112+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: POC: Parallel processing of indexes in autovacuum
  In-Reply-To: <CAD21AoBT1LWqPZkcHpVMVh0ZOXUneO=p61t0i8cQ+kOP9qfODQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox