Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umelB-001WYp-8L for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 20:41:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umel4-009dV2-Tl for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 20:41:11 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umel4-009dUu-DZ for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 20:41:10 +0000 Received: from mail-lf1-x129.google.com ([2a00:1450:4864:20::129]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1umekx-000bYE-1C for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 20:41:09 +0000 Received: by mail-lf1-x129.google.com with SMTP id 2adb3069b0e04-55ce5277b8eso1441671e87.2 for ; Thu, 14 Aug 2025 13:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755204062; x=1755808862; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WUr6JJIL/csj0zxo+Xat6OBKu/rCC4+KmITI2QfgSZo=; b=b+rbrVSzDicyX/QtLHfRKXzXjQfjCj0rtv2vSABImqe3Cq4ITjtNCzjfV8ZXqKFSNN JDTZCT5BdqEERugkl1v8pZSCuuXh+6GAhb+RLzeOF6targf0JtpOOkAR+eNZkYAwUUNY GJyWUWM5HxELUiA5xX0TtvZqkr/INNPMfqYMz7v1MHHyvfcc7AL1FbP93oFzHbz4dyLx n6YD4QoshS1dy/YKKrbTRrkY0ZvrAxY4wGPD3Qm3c648z+Hoa7aaUx3QCtVr56sA4vJ1 csYyQKHr0MzFJXJTev/fPHfX8GakdbS5T4mvPTfSkXKQX/k4llFfzIgX76EDXya4WI+P Ggow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755204062; x=1755808862; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WUr6JJIL/csj0zxo+Xat6OBKu/rCC4+KmITI2QfgSZo=; b=orP/kNCpgoCwVvzGyTWPTo5FL9EaB1X8ahtw9vrmp3ae1iSVfAkqcIYnkiWNA3Z9qF NTNQgAGcv5nASul8xT3OkptCPZhYslXgV8ir1a1tuc2Iu/TuuuTtxkaYxLnQua9/2LOV PIDzRK1U4gqb7wFtdXEnBcX+AeDyyFvRgq/jyr6xgogv+9kOfF6sPBapxAfSe37Yv1lS IiPNeLtqkn7NETb80mW0rWBpCiKeTDvarzaTPtHhsMo0DfQvwGO26ZAluYks3eTO00Mu kOZji79pkcaaBu6JvHdRpT18rQeZYeB+YGC1e1FUUJmIDt/wcjCfiK3ZxlZ55Ossv2T4 pgWg== X-Forwarded-Encrypted: i=1; AJvYcCUkMhYbD21IvkL/YePq5nJMccYDh7jGro+0VOtxMim8s1svI5zNACRSFuQ/JHORtN+FhnZ5/VY/dH13scX8@lists.postgresql.org X-Gm-Message-State: AOJu0YzLfzeThfrCVxOhO/LO/x+wWuJuz+MLe9V5u3qZY7RgIiQpVsny 9VSV2WcHDsd5oTRvw1hpfWp88aWrPKh7zEqsQdHJQbOQ7L2LgxnRAVo0Zu19f13e1DgqE9VvunD 27d9l+dR0ByS8aQvgZ7XdvnwLktVpJfFGBPvM X-Gm-Gg: ASbGnct/qgOq4fMs1c6I8CzZdk02SMHmKGZzs90bRly1ilZMgHyQbvWXiZTKV3LfOlH gRlFZ8qORKjHKRKy47oq+jjoI7DXM/ysXfP7NiqI/TPPipIV3f7DqedDkfrl3elYJhD8uZXBySW jZAdJrYebTwus8DjLnYWg51nYjSpjlmuRPZS6aFHdWSOw5Jh0E5+x73M018OyMdsrEAoMj2DBz8 aiulA== X-Google-Smtp-Source: AGHT+IE9UTdWU2CB8RatzsRRn6cyTf9mrCDFJOqgcAYAOfNOmr7R2jba4K/izxyojFub02AsC14tTwn2f4gdWV7pGJY= X-Received: by 2002:ac2:4c55:0:b0:55b:87cc:eea9 with SMTP id 2adb3069b0e04-55ce4ffee38mr1640424e87.4.1755204061770; Thu, 14 Aug 2025 13:41:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Thu, 14 Aug 2025 13:40:25 -0700 X-Gm-Features: Ac12FXw5OSOgmB_TA0AwTt5V4BrtYcom-P8m_XFU6NDarYq894tMDYIsImEBnko Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Daniil Davydov <3danissimo@gmail.com> Cc: Sami Imseih , Matheus Alcantara , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Aug 7, 2025 at 4:38=E2=80=AFPM Masahiko Sawada wrote: > > On Mon, Jul 21, 2025 at 11:45=E2=80=AFPM Daniil Davydov <3danissimo@gmail= .com> wrote: > > > > Hi, > > > > On Mon, Jul 21, 2025 at 11:40=E2=80=AFPM Sami Imseih wrote: > > > > > > I have only reviewed the v8-0001-Parallel-index-autovacuum.patch so f= ar and > > > have a few comments from my initial pass. > > > > > > 1/ Please run pgindent. > > > > OK, I'll do it. > > > > > 2/ Documentation is missing. There may be more, but here are the plac= es I > > > found that likely need updates for the new behavior, reloptions, GUC,= etc. > > > Including docs in the patch early would help clarify expected behavio= r. > > > > > > https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM= -BASICS > > > https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVA= CUUM > > > https://www.postgresql.org/docs/current/runtime-config-autovacuum.htm= l > > > https://www.postgresql.org/docs/current/sql-createtable.html > > > https://www.postgresql.org/docs/current/sql-altertable.html > > > https://www.postgresql.org/docs/current/runtime-config-resource.html#= GUC-MAX-WORKER-PROCESSES > > > https://www.postgresql.org/docs/current/runtime-config-resource.html#= GUC-MAX-PARALLEL-MAINTENANCE-WORKERS > > > > > > > Thanks for gathering it all together. I'll update the documentation so > > it will reflect changes in autovacuum daemon, reloptions and GUC > > parameters. So far, I don't see what we can add to vacuum-basics > > and alter-table paragraphs. > > > > I'll create separate .patch file for changes in documentation. > > > > > One thing I am unclear on is the interaction between max_worker_proce= sses, > > > max_parallel_workers, and max_parallel_maintenance_workers. For examp= le, does > > > the following change mean that manual VACUUM PARALLEL is no longer ca= pped by > > > max_parallel_maintenance_workers? > > > > > > @@ -597,8 +614,8 @@ parallel_vacuum_compute_workers(Relation *indrels= , > > > int nindexes, int nrequested, > > > parallel_workers =3D (nrequested > 0) ? > > > Min(nrequested, nindexes_parallel) : nindexes_paralle= l; > > > > > > - /* Cap by max_parallel_maintenance_workers */ > > > - parallel_workers =3D Min(parallel_workers, > > > max_parallel_maintenance_workers); > > > + /* Cap by GUC variable */ > > > + parallel_workers =3D Min(parallel_workers, max_parallel_worke= rs); > > > > > > > Oh, it is my poor choice of a name for a local variable (I'll rename it= ). > > This variable can get different values depending on performed operation= : > > autovacuum_max_parallel_workers for parallel autovacuum and > > max_parallel_maintenance_workers for maintenance VACUUM. > > > > > > > > 3/ Shouldn't this be "max_parallel_workers" instead of "bgworkers poo= l" ? > > > > > > + "autovacuum_parallel_workers", > > > + "Maximum number of parallel autovacuum worker= s > > > that can be taken from bgworkers pool for processing this table. " > > > > > > > I don't think that we should refer to max_parallel_workers here. > > Actually, this reloption doesn't depend on max_parallel_workers at all. > > I wrote about bgworkers pool (both here and in description of > > autovacuum_max_parallel_workers parameter) in order to clarify that > > parallel autovacuum will use dynamic workers instead of launching > > more a/v workers. > > > > BTW, I don't really like that the comment on this option turns out to b= e > > very large. I'll leave only short description in reloptions.c and move > > clarification about zero value in rel.h > > Mentions of bgworkers pool will remain only in > > description of autovacuum_max_parallel_workers. > > > > > 4/ The comment "When parallel autovacuum worker die" suggests an abno= rmal > > > exit. "Terminates" seems clearer, since this applies to both normal a= nd > > > abnormal exits. > > > > > > instead of: > > > + * When parallel autovacuum worker die, > > > > > > how about this: > > > * When parallel autovacuum worker terminates, > > > > > > > Sounds reasonable, I'll fix it. > > > > > > > > 5/ Any reason AutoVacuumReleaseParallelWorkers cannot be called befor= e > > > DestroyParallelContext? > > > > > > + nlaunched_workers =3D pvs->pcxt->nworkers_launched; /* rememb= er > > > this value */ > > > DestroyParallelContext(pvs->pcxt); > > > + > > > + /* Release all launched (i.e. reserved) parallel autovacuum w= orkers. */ > > > + if (AmAutoVacuumWorkerProcess()) > > > + AutoVacuumReleaseParallelWorkers(nlaunched_workers); > > > > > > > I wrote about it above [1], but I think I can duplicate my thoughts her= e : > > """ > > Destroying parallel context includes waiting for all workers to exit (a= fter > > which, other operations can use them). > > If we first call ParallelAutoVacuumReleaseWorkers, some operation can > > reasonably request all released workers. But this request can fail, > > because there is no guarantee that workers managed to finish. > > > > Actually, there's nothing wrong with that, but I think releasing worker= s > > only after finishing work is a more logical approach. > > """ > > > > > > > > 6/ Also, would it be cleaner to move AmAutoVacuumWorkerProcess() insi= de > > > AutoVacuumReleaseParallelWorkers()? > > > > > > if (!AmAutoVacuumWorkerProcess()) > > > return; > > > > > > > It seems to me that the opposite is true. If there is no alternative to= calling > > AmAutoVacuumWorkerProcess, it might confuse somebody. All doubts > > will disappear after viewing the AmAutoVacuumWorkerProcess code, > > but IMO code in vacuumparallel.c will become less intuitive. > > > > > 7/ It looks like the psql tab completion for autovacuum_parallel_work= ers is > > > missing: > > > > > > test=3D# alter table t set (autovacuum_ > > > autovacuum_analyze_scale_factor > > > autovacuum_analyze_threshold > > > autovacuum_enabled > > > autovacuum_freeze_max_age > > > autovacuum_freeze_min_age > > > autovacuum_freeze_table_age > > > autovacuum_multixact_freeze_max_age > > > autovacuum_multixact_freeze_min_age > > > autovacuum_multixact_freeze_table_age > > > autovacuum_vacuum_cost_delay > > > autovacuum_vacuum_cost_limit > > > autovacuum_vacuum_insert_scale_factor > > > autovacuum_vacuum_insert_threshold > > > autovacuum_vacuum_max_threshold > > > autovacuum_vacuum_scale_factor > > > autovacuum_vacuum_threshold > > > > > > > Good catch, I'll fix it. > > > > Thank you for the review! Please, see v9 patches : > > 1) Run pgindent + rebase patches on newest commit in master. > > 2) Introduce changes for documentation. > > 3) Rename local variable in parallel_vacuum_compute_workers. > > 4) Shorten the description of autovacuum_parallel_workers in > > reloptions.c (move clarifications for it into rel.h). > > 5) Reword "When parallel autovacuum worker die" comment. > > 6) Add tab completion for autovacuum_parallel_workers table option. > > Thank you for updating the patch. Here are some review comments. > > + /* Release all launched (i.e. reserved) parallel autovacuum work= ers. */ > + if (AmAutoVacuumWorkerProcess()) > + AutoVacuumReleaseParallelWorkers(nlaunched_workers); > + > > We release the reserved worker in parallel_vacuum_end(). However, > parallel_vacuum_end() is called only once at the end of vacuum. I > think we need to release the reserved worker after index vacuuming or > cleanup, otherwise we would end up holding the reserved workers until > the end of vacuum even if we invoke index vacuuming multiple times. > > --- > +void > +assign_autovacuum_max_parallel_workers(int newval, void *extra) > +{ > + autovacuum_max_parallel_workers =3D Min(newval, max_worker_proce= sses); > +} > > I don't think we need the assign hook for this GUC parameter. We can > internally cap the maximum value by max_worker_processes like other > GUC parameters such as max_parallel_maintenance_workers and > max_parallel_workers. > > ---+ /* Refresh autovacuum_max_parallel_workers paremeter */ > + CHECK_FOR_INTERRUPTS(); > + if (ConfigReloadPending) > + { > + ConfigReloadPending =3D false; > + ProcessConfigFile(PGC_SIGHUP); > + } > + > + LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); > + > + /* > + * If autovacuum_max_parallel_workers parameter was reduced duri= ng > + * parallel autovacuum execution, we must cap available > workers number by > + * its new value. > + */ > + AutoVacuumShmem->av_freeParallelWorkers =3D > + Min(AutoVacuumShmem->av_freeParallelWorkers + nworkers, > + autovacuum_max_parallel_workers); > + > + LWLockRelease(AutovacuumLock); > > I think another race condition could occur; suppose > autovacuum_max_parallel_workers is set to '5' and one autovacuum > worker reserved 5 workers, meaning that > AutoVacuumShmem->av_freeParallelWorkers is 0. Then, the user changes > autovacuum_max_parallel_workers to 3 and reloads the conf file right > after the autovacuum worker checks the interruption. The launcher > processes calls adjust_free_parallel_workers() but > av_freeParallelWorkers remains 0, and the autovacuum worker increments > it by 5 as its autovacuum_max_parallel_workers value is still 5. > > I think that we can have the autovacuum_max_parallel_workers value on > shmem, and only the launcher process can modify its value if the GUC > is changed. Autovacuum workers simply increase or decrease the > av_freeParallelWorkers within the range of 0 and the > autovacuum_max_parallel_workers value on shmem. When changing > autovacuum_max_parallel_workers and av_freeParallelWorkers values on > shmem, the launcher process calculates the number of workers reserved > at that time and calculate the new av_freeParallelWorkers value by > subtracting the new autovacuum_max_parallel_workers by the number of > reserved workers. > > --- > +AutoVacuumReserveParallelWorkers(int nworkers) > +{ > + int can_launch; > > How about renaming it to 'nreserved' or something? can_launch looks > like it's a boolean variable to indicate whether the process can > launch workers. While testing the patch, I found there are other two problems: 1. when an autovacuum worker who reserved workers fails with an error, the reserved workers are not released. I think we need to ensure that all reserved workers are surely released at the end of vacuum even with an error. 2. when an autovacuum worker (not parallel vacuum worker) who uses parallel vacuum gets SIGHUP, it errors out with the error message "parameter "max_stack_depth" cannot be set during a parallel operation". Autovacuum checks the configuration file reload in vacuum_delay_point(), and while reloading the configuration file, it attempts to set max_stack_depth in InitializeGUCOptionsFromEnvironment() (which is called by ProcessConfigFileInternal()). However, it cannot change max_stack_depth since the worker is in parallel mode but max_stack_depth doesn't have GUC_ALLOW_IN_PARALLEL flag. This doesn't happen in regular backends who are using parallel queries because they check the configuration file reload at the end of each SQL command. Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com