Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w70K2-004ref-2e for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Mar 2026 00:17:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w70K1-0006tt-0z for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Mar 2026 00:17:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w70K0-0006tf-25 for pgsql-hackers@lists.postgresql.org; Mon, 30 Mar 2026 00:17:37 +0000 Received: from mail-vk1-xa2e.google.com ([2607:f8b0:4864:20::a2e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w70Jy-00000001igY-0Bf7 for pgsql-hackers@lists.postgresql.org; Mon, 30 Mar 2026 00:17:36 +0000 Received: by mail-vk1-xa2e.google.com with SMTP id 71dfb90a1353d-56b7e5dfd25so3737035e0c.1 for ; Sun, 29 Mar 2026 17:17:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774829853; cv=none; d=google.com; s=arc-20240605; b=aDnWwndBTjoX1HYXBS91rzJj/Jxv1D/CPZ50AiKVpBXdz5QPm7VMnV1lYMbeX/bwrk XRkWqzKXHE/TuhN9ZWCNm/PBKEigaJYEbjliIj4gUBVRjKCns0AypO+kq6CinJ/4OQCA 1vqOAsNe4ncCfn/KaLKe4lLddP9jstfVK9mwJYiSwFcF0BANDh3iKuBJKnsD6zaaZ9M4 mWQxb8oWbu11n3mQl1Hr1jF/99BhFXJGet/3xeb/M1/HBFR4gtdPsQLJGAPkvWAjKl+n 5SK/4r3GDBDmSjfLEHsd6t++p19qjd+E5y+4Saejow/EpumWr4vmfHroGyEG2wosqlA6 aGMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=KHZpSqvAbRWToAr66lK9po6q4JMAxH1f2QoPJOkYSQo=; fh=d4JaqJ0ZBfTYMJ6RwhRmtcFi0hCafjSVMK13S/hbCco=; b=Vr/lCXSrY0JYLLdd1/TJ44bF96Zujx69w58v3ysQdxHSxj9CfG5XKpcjBp2PyWVCwm uDhHs/uBBCHnPe/wJSAour/KDgPS96GaNGgQ+lA5CC1rFHElIFWFWfJ2+vgsWvL3gbq+ +p0ywUusYJSnVkuVE5NYTpyiDuJSkfawgGhSDJYcVZdsVyCWFyofvF9QbiLIuiTl+2ox FeE6xn+XySgrNgVx/HY8gKgInyW4cdoWfgFEpFA6p4EHu8pM4akn/NeNw3iCzFE0A5ds dQVZAy7YUWYTxDpZxUDh6maWjws4gzJdK0gPHU0MJ1pnlc3xLYif4HIL9oyjNXmpnOSV Xsew==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774829853; x=1775434653; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KHZpSqvAbRWToAr66lK9po6q4JMAxH1f2QoPJOkYSQo=; b=akT3S1StvOweIuT2sntV8d6arBtHbe17wLBGUBpSFXAeloMwrA4TA5ZyePTCiaSyc7 scy8dEaX2ihoOaPzin67/ZnjkWdejGId6Ot7BZ7GDCwY8GOHIYLKWJ7S8v6/heAOP4aZ I60XCMAI0xoU62TxDwhJbB/ewXgH1cxRapNvabI/t5iPsy/3XOYqw0nEU2Yd6BKnCYv/ zBPGqPjsiDxBAtJoKnG2fWsewmjEJQ13oqije2sfcyIltanV6kfIo/4F02vFsTBtvF4z 6h7LV7lneBEEq1idkOPw5/ODrRW8zwHpvB3+pnsprtO7ytBS7E807ldEcvepha7w4hwm LCQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774829853; x=1775434653; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=KHZpSqvAbRWToAr66lK9po6q4JMAxH1f2QoPJOkYSQo=; b=sUVxHyFaWRvlIfDw1lWs9o4OH+F46Gd7cOiq45rUUEYGB5cjFjUsQ+WKK5Nvrxz2W6 nh457As1pye+GIa13IWx8wk8IKI9vRVk1dJOzeMV2KabFfcUILcZlUqkfFOOyUn5RfEw itCPrYJzZgzbXnuiw1/h4ZHNtop2W6zkyzU/QjkAE6LGBN4pF7sXDgiDRlkB1XzGOeVk hlSDTpAIj025LrZIhHbSuJNtuwXhWUvLrl+GD1EdydrfKsKMn3lCNlgx9NWI/lbhE1EY qZZJu+wem6KGJI7KyUPhJTVFTB9JRBiyE6bboo9TEd7bMsbp6KL1iAZMT+rFfXujd0Jq E5YQ== X-Forwarded-Encrypted: i=1; AJvYcCVyh4WgmoKHC7cfcGQKte53snGdrFbKOVGenKqPsFKCmBq1iAdfHxgTq/4YxssPV4O4EpTOMxbnJR8RL6/W@lists.postgresql.org X-Gm-Message-State: AOJu0YwpVqPn7yQ56ON9IYfBYqhH9C5zHGZMce9QokLbJn3iHBv4840G PqN2y0QO2XjRiUep8d7QjkqnpDsK0B4hDo2lu21UL16R3w0HRCjzFBhEuMoFE9K0YSt38uvb+ex 4reLj1eSaowuUJ0i7jPVtftBq1/B7k+I= X-Gm-Gg: ATEYQzxOUvsJFh/b3sQim3uN6uLgFs8onVWkL3hfwvvWeEn+drONdyDWoFlM+5f/07M dFZi05VUGchrcFLu8ovtuHL9Q96QlDEcxtYip6kCrSekGhmx/EbVTB0vXqosTQ9T2ci5w0zsxHL fmJTkkKgfoYsuWxO3cjxksaLM0CAGiMJN7Ag62D8LI2pzvKm72jomscb2beHaMVPMUu+dF5h1Yc CEvmRZxqORAb2hWnDpxJ9SfgvqpejbaeGF+fXXgzFVJeQETGnLB1J4PGGAn7MrAa5q7LbgkGT9m UZy9us8= X-Received: by 2002:a05:6122:6589:b0:56a:8d44:97e3 with SMTP id 71dfb90a1353d-56d4ba3b3b0mr3572583e0c.5.1774829853219; Sun, 29 Mar 2026 17:17:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: SATYANARAYANA NARLAPURAM Date: Sun, 29 Mar 2026 17:17:21 -0700 X-Gm-Features: AQROBzCcAw1W15Bz_CR7fbwNYjdQ2dowkoDkoVZs5qsLwuGenS7u65f8npuBldM Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Daniil Davydov <3danissimo@gmail.com> Cc: Bharath Rupireddy , Masahiko Sawada , Sami Imseih , Alexander Korotkov , Matheus Alcantara , Maxim Orlov , Postgres hackers Content-Type: multipart/alternative; boundary="000000000000f01fd9064e32c6e7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f01fd9064e32c6e7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi On Sat, Mar 28, 2026 at 4:11=E2=80=AFAM Daniil Davydov <3danissimo@gmail.co= m> wrote: > Hi, > > On Thu, Mar 26, 2026 at 5:43=E2=80=AFAM Masahiko Sawada > wrote: > > > > On Wed, Mar 25, 2026 at 12:45=E2=80=AFAM Daniil Davydov <3danissimo@gma= il.com> > wrote: > > > > > > Searching for arguments in > > > favor of opt-in style, I asked for help from another person who has > been > > > managing the setup of highload systems for decades. He promised to > share his > > > opinion next week. > > > > Given that we have one and half weeks before the feature freeze, I > > think it's better to complete the project first before waiting for > > his/her comments next week. Even if we finish this feature with the > > opt-out style, we can hear more opinions on it and change the default > > behavior as the change would be privial. What do you think? > > > > Sure, if we can change the default value after the feature freeze, I don'= t > mind leaving our parameter in opt-out style by now. > > > I've squashed all patches except for the documentation patch as I > > assume you're working on it. The attached fixup patch contains several > > changes: using opt-out style, comment improvements, and fixing typos > > etc. > > > > Thank you very much for the proposed fixes! > I like the way you have changed nparallel_workers calculation > (autovacuum.c). > Forcing parallel workers to always read shared cost params at the first > time > is a good decision. All comments changes are also LGTM. > > The only place that I have changed is reloptions.c : > As you have explained, it is not appropriate to use the "overrides" wordi= ng > in the reloption's description, so I decided to return an old one. > > On Fri, Mar 27, 2026 at 10:54=E2=80=AFAM Bharath Rupireddy > wrote: > > > > Hi, > > > > On Wed, Mar 25, 2026 at 3:43=E2=80=AFPM Masahiko Sawada > wrote: > > > > > > Given that we have one and half weeks before the feature freeze, I > > > think it's better to complete the project first before waiting for > > > his/her comments next week. Even if we finish this feature with the > > > opt-out style, we can hear more opinions on it and change the default > > > behavior as the change would be privial. What do you think? > > > > > > I've squashed all patches except for the documentation patch as I > > > assume you're working on it. The attached fixup patch contains severa= l > > > changes: using opt-out style, comment improvements, and fixing typos > > > etc. > > > > +1 for enabling this feature by default. When enough CPU is available, > > vacuuming multiple indexes of a table in parallel in autovacuum > > definitely speeds things up. > > Yes, for sure. But I have concerns that enabling parallel a/v for everyon= e > will cause the parallel workers shortage during processing of the most hu= ge > tables. > > > Thank you for sending the latest patches. I quickly reviewed the v31 > > patches. Here are some comments. > > > > 1/ + {"autovacuum_parallel_workers", RELOPT_TYPE_INT, > > > > I haven't looked at the whole thread, but do we all think we need this > > as a relopt? IMHO, we can wait for field experience and introduce this > > later. > > I think that we should leave both reloption and the config parameter. > Getting rid from the reloption will greatly reduce the ability of users t= o > tune this feature. I'm afraid that this may lead to people not using > parallel > autovacuum. > > > I'm having a hard time finding a use-case where one wants to > > disable the indexes at the table level. If there was already an > > agreement, I agree to commit to that decision. > > You can read discussion from [1] to the current message in order to dive > into > the question. > > To make the long story short, I think that the most common use case for > this > feature is allowing parallel a/v for 2-3 tables, each of which has ~100 > indexes. The rest of the tables do not require parallel processing (at > least > it's a much lower priority for them). > > At the same time, Masahiko-san thinks that only the system should decide > which > tables will be processed in parallel. System's decision should be based o= n > the > number of indexes and a few other config parameters (e.g. > min_parallel_index_scan_size). Thus, possibly many tables will be able to > be > processed in parallel. > > (Both opinions are pretty simplified). > > > > > 2/ + /* > > + * If 'true' then we are running parallel autovacuum. Otherwise, we > are > > + * running parallel maintenence VACUUM. > > + */ > > + bool is_autovacuum; > > + > > > > The variable name looks a bit confusing. How about we rely on > > AmAutoVacuumWorkerProcess() and avoid the bool in shared memory? > > This variable is needed for parallel workers, which are taken from the > bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for them. > We need the "is_autovacuum" variable in order to understand exactly what > this > process was started for (VACUUM PARALLEL or parallel autovacuum). > > > Thanks everyone for the review! > Please, see an updated set of patches : > As I promised, I created a dedicated chapter for Parallel Vacuum > description. > Both maintenance VACUUM and autovacuum now refer to this chapter. > > I am pretty inexperienced in the documentation writing, so forgive me if > something is out of code style. > > [1] > https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55CGLvs_sRU4SYN= XwYLZ37wvJ5H_3yURSPw%40mail.gmail.com Thank you for working on this, very useful feature. Sharing a few thoughts: 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers? 2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limits are distributed first equally to the autovacuum workers. and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist with manual vacuum. If we don't fix this, at least we should document this. 3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductive compared to running fewer workers and preventing it? 4. Would it make sense to add a table level override to disable parallelism or set parallel worker count? I ran some perf tests to show the improvements with parallel vacuum and shared below. System Configuration -------------------- Hardware: CPU: 16 cores RAM: 128 GB Storage: NVMe SSDs OS: Ubuntu Linux Workload Description -------------------- Table: avtest - 5,000,000 rows - 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7 (timestamp), padding (text, 50 bytes) - 8 indexes: avtest_pkey (col: id) 107 MB idx_av_col7 (col: col7) 107 MB idx_av_col2 (col: col2) 56 MB idx_av_col4 (col: col4) 56 MB idx_av_col5 (col: col5) 56 MB idx_av_col1 (col: col1) 56 MB idx_av_col3 (col: col3) 56 MB idx_av_col6 (col: col6) 35 MB - Total size: 1171 MB Each test iteration: 1. Delete 2,000,000 rows (40%) using: DELETE WHERE id % 5 IN (1, 2) 2. CHECKPOINT to flush dirty pages 3. Trigger autovacuum by setting autovacuum_vacuum_threshold =3D 100 and autovacuum_vacuum_scale_factor =3D 0 on the table 4. Wait for autovacuum to complete (detected via server log) 5. Re-insert the deleted rows and VACUUM to restore the table for the next run Test Methodology ---------------- Worker configurations tested: 0, 2, 4, 7 parallel workers (7 is the maximum: nindexes - 1, since the leader always handles one index) Two experiments were run with different cost-based vacuum delay settings: Experiment A: cost_limit=3D200, cost_delay=3D2ms Experiment B: cost_limit=3D60, cost_delay=3D2ms Common server settings for both experiments: shared_buffers =3D 120 GB (entire dataset fits in shared buffers) maintenance_work_mem =3D 1 GB max_wal_size =3D 100 GB (prevents checkpoints during vacuum) min_wal_size =3D 10 GB checkpoint_timeout =3D 1 hour (prevents time-based checkpoints) wal_buffers =3D 128 MB max_parallel_workers =3D 16 max_worker_processes =3D 24 autovacuum_naptime =3D 1s Between every single run: 1. PostgreSQL server is fully stopped (pg_ctl stop -m fast) 2. OS page cache is dropped (echo 3 > /proc/sys/vm/drop_caches) 3. Server is restarted with a clean log file 4. After DELETE and CHECKPOINT, the server is stopped again, OS caches dropped again, and the server restarted -- so vacuum starts fully cold 5. The autovacuum_max_parallel_workers GUC is reloaded via pg_ctl reload Each configuration was tested for 5 iterations. Timing is extracted from the PostgreSQL server log "system usage" line that autovacuum emits at completion. This reports elapsed wall-clock time and CP= U time for the autovacuum worker leader process. Results: Experiment A (cost_limit=3D200, cost_delay=3D2ms) ------------------------------------------------------ Workers Iter1 Iter2 Iter3 Iter4 Iter5 Avg(s) Speedup ------- ------ ------ ------ ------ ------ ------ ------- 0 66.21 79.11 66.27 77.11 66.30 71.00 1.00x 2 66.55 53.27 52.66 55.74 55.71 56.78 1.25x 4 51.50 51.74 65.07 52.06 70.25 58.12 1.22x 7 50.05 50.35 50.04 50.12 50.07 50.12 1.41x CPU usage (leader process only): Workers Avg CPU user Avg CPU sys ------- ----------- ---------- 0 3.04s 1.70s 2 1.24s 1.50s 4 0.78s 1.49s 7 0.79s 1.48s Results: Experiment B (cost_limit=3D60, cost_delay=3D2ms) ----------------------------------------------------- Workers Iter1 Iter2 Iter3 Iter4 Iter5 Avg(s) Speedup ------- ------ ------ ------ ------ ------ ------ ------- 0 199.00 195.26 191.44 191.90 191.67 193.85 1.00x 2 160.68 181.33 176.85 167.84 159.47 169.23 1.14x 4 154.02 165.02 174.33 164.16 156.53 162.81 1.19x 7 148.49 158.68 160.66 154.37 149.20 154.28 1.25x CPU usage (leader process only): Workers Avg CPU user Avg CPU sys ------- ----------- ---------- 0 3.06s 1.90s 2 1.28s 1.72s 4 0.80s 1.69s 7 0.82s 1.68s *Observations:* 1. Parallel autovacuum provides consistent speedup. With cost_limit=3D200 a= nd 7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=3D60, the speedup is 1.25x (194s -> 154s). 2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling ~530 MB, parallel workers scan indexes concurrently instead of the leade= r scanning them one by one. The leader's CPU user time drops from ~3s to ~0.8s as index work is offloaded Thanks, Satya --000000000000f01fd9064e32c6e7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

On Sat, Mar 28, 2= 026 at 4:11=E2=80=AFAM Daniil Davydov <3danissimo@gmail.com> wrote:
Hi,

On Thu, Mar 26, 2026 at 5:43=E2=80=AFAM Masahiko Sawada <sawada.mshk@gmail.com> w= rote:
>
> On Wed, Mar 25, 2026 at 12:45=E2=80=AFAM Daniil Davydov <3danissimo@gmail.com>= ; wrote:
> >
> >=C2=A0 Searching for arguments in
> > favor of opt-in style, I asked for help from another person who h= as been
> > managing the setup of highload systems for decades. He promised t= o share his
> > opinion next week.
>
> Given that we have one and half weeks before the feature freeze, I
> think it's better to complete the project first before waiting for=
> his/her comments next week. Even if we finish this feature with the > opt-out style, we can hear more opinions on it and change the default<= br> > behavior as the change would be privial. What do you think?
>

Sure, if we can change the default value after the feature freeze, I don= 9;t
mind leaving our parameter in opt-out style by now.

> I've squashed all patches except for the documentation patch as I<= br> > assume you're working on it. The attached fixup patch contains sev= eral
> changes: using opt-out style, comment improvements, and fixing typos > etc.
>

Thank you very much for the proposed fixes!
I like the way you have changed nparallel_workers calculation (autovacuum.c= ).
Forcing parallel workers to always read shared cost params at the first tim= e
is a good decision. All comments changes are also LGTM.

The only place that I have changed is reloptions.c :
As you have explained, it is not appropriate to use the "overrides&quo= t; wording
in the reloption's description, so I decided to return an old one.

On Fri, Mar 27, 2026 at 10:54=E2=80=AFAM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> On Wed, Mar 25, 2026 at 3:43=E2=80=AFPM Masahiko Sawada <sawada.mshk@gmail.com&= gt; wrote:
> >
> > Given that we have one and half weeks before the feature freeze, = I
> > think it's better to complete the project first before waitin= g for
> > his/her comments next week. Even if we finish this feature with t= he
> > opt-out style, we can hear more opinions on it and change the def= ault
> > behavior as the change would be privial. What do you think?
> >
> > I've squashed all patches except for the documentation patch = as I
> > assume you're working on it. The attached fixup patch contain= s several
> > changes: using opt-out style, comment improvements, and fixing ty= pos
> > etc.
>
> +1 for enabling this feature by default. When enough CPU is available,=
> vacuuming multiple indexes of a table in parallel in autovacuum
> definitely speeds things up.

Yes, for sure. But I have concerns that enabling parallel a/v for everyone<= br> will cause the parallel workers shortage during processing of the most huge=
tables.

> Thank you for sending the latest patches. I quickly reviewed the v31 > patches. Here are some comments.
>
> 1/ +=C2=A0 =C2=A0 =C2=A0 =C2=A0{"autovacuum_parallel_workers"= ;, RELOPT_TYPE_INT,
>
> I haven't looked at the whole thread, but do we all think we need = this
> as a relopt? IMHO, we can wait for field experience and introduce this=
> later.

I think that we should leave both reloption and the config parameter.
Getting rid from the reloption will greatly reduce the ability of users to<= br> tune this feature. I'm afraid that this may lead to people not using pa= rallel
autovacuum.

> I'm having a hard time finding a use-case where one wants to
> disable the indexes at the table level. If there was already an
> agreement, I agree to commit to that decision.

You can read discussion from [1] to the current message in order to dive in= to
the question.

To make the long story short, I think that the most common use case for thi= s
feature is allowing parallel a/v for 2-3 tables, each of which has ~100
indexes. The rest of the tables do not require parallel processing (at leas= t
it's a much lower priority for them).

At the same time, Masahiko-san thinks that only the system should decide wh= ich
tables will be processed in parallel. System's decision should be based= on the
number of indexes and a few other config parameters (e.g.
min_parallel_index_scan_size). Thus, possibly many tables will be able to b= e
processed in parallel.

(Both opinions are pretty simplified).

>
> 2/=C2=A0 +=C2=A0 =C2=A0/*
> +=C2=A0 =C2=A0 * If 'true' then we are running parallel autova= cuum. Otherwise, we are
> +=C2=A0 =C2=A0 * running parallel maintenence VACUUM.
> +=C2=A0 =C2=A0 */
> +=C2=A0 =C2=A0bool=C2=A0 =C2=A0 =C2=A0 =C2=A0 is_autovacuum;
> +
>
> The variable name looks a bit confusing. How about we rely on
> AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?

This variable is needed for parallel workers, which are taken from the
bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for t= hem.
We need the "is_autovacuum" variable in order to understand exact= ly what this
process was started for (VACUUM PARALLEL or parallel autovacuum).


Thanks everyone for the review!
Please, see an updated set of patches :
As I promised, I created a dedicated chapter for Parallel Vacuum descriptio= n.
Both maintenance VACUUM and autovacuum now refer to this chapter.

I am pretty inexperienced in the documentation writing, so forgive me if something is out of code style.

[1] https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55C= GLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com

=
Thank you for working on this, very useful feature. Sharing a fe= w thoughts:

1. Shouldn't we also cap by max_pa= rallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_wo= rkers?
2. Is it intentional that other autovacuum workers not yie= ld cost limits to the parallel auto vacuum workers? Cost limits are distrib= uted first equally to the autovacuum workers.
and then they share= that. Therefore, parallel workers will be heavily throttled. IIUC, this pr= oblem doesn't exist with=C2=A0manual vacuum.
=C2=A0If we don&= #39;t fix this, at least we should document this.
3. Additionally= , is there a point where, based on the cost limits, launching additional wo= rkers becomes counterproductive compared to running fewer workers and preve= nting it?
4. Would it make sense to add a table level override to= disable parallelism or set parallel worker count?


I ran some perf tests to show the improvements with paralle= l vacuum and shared below.
System Configuration
--------------= ------
Hardware:
=C2=A0 CPU: 16 cores
=C2=A0 RAM: 128 GB
=C2=A0= Storage: NVMe SSDs
=C2=A0 OS: Ubuntu Linux

Workload Description<= br>--------------------
Table: avtest
=C2=A0 - 5,000,000 rows
=C2= =A0 - 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7 (timest= amp),
=C2=A0 =C2=A0 padding (text, 50 bytes)
=C2=A0 - 8 indexes:
= =C2=A0 =C2=A0 =C2=A0 avtest_pkey =C2=A0(col: id) =C2=A0 =C2=A0 =C2=A0 =C2= =A0107 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_col7 =C2=A0(col: col7) =C2=A0 =C2= =A0 =C2=A0107 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_col2 =C2=A0(col: col2) =C2= =A0 =C2=A0 =C2=A0 56 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_col4 =C2=A0(col: col= 4) =C2=A0 =C2=A0 =C2=A0 56 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_col5 =C2=A0(co= l: col5) =C2=A0 =C2=A0 =C2=A0 56 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_col1 =C2= =A0(col: col1) =C2=A0 =C2=A0 =C2=A0 56 MB
=C2=A0 =C2=A0 =C2=A0 idx_av_co= l3 =C2=A0(col: col3) =C2=A0 =C2=A0 =C2=A0 56 MB
=C2=A0 =C2=A0 =C2=A0 idx= _av_col6 =C2=A0(col: col6) =C2=A0 =C2=A0 =C2=A0 35 MB
=C2=A0 - Total siz= e: 1171 MB

Each test iteration:
=C2=A0 1. Delete 2,000,000 rows (= 40%) using: DELETE WHERE id % 5 IN (1, 2)
=C2=A0 2. CHECKPOINT to flush = dirty pages
=C2=A0 3. Trigger autovacuum by setting autovacuum_vacuum_th= reshold =3D 100 and
=C2=A0 =C2=A0 =C2=A0autovacuum_vacuum_scale_factor = =3D 0 on the table
=C2=A0 4. Wait for autovacuum to complete (detected v= ia server log)
=C2=A0 5. Re-insert the deleted rows and VACUUM to restor= e the table for the next run


Test Methodology
---------------= -
Worker configurations tested: 0, 2, 4, 7 parallel workers
=C2=A0 (7= is the maximum: nindexes - 1, since the leader always handles one index)
Two experiments were run with different cost-based vacuum delay setti= ngs:

=C2=A0 Experiment A: cost_limit=3D200, cost_delay=3D2ms
=C2= =A0 Experiment B: cost_limit=3D60, =C2=A0cost_delay=3D2ms

Common ser= ver settings for both experiments:
=C2=A0 shared_buffers =C2=A0 =C2=A0 = =C2=A0 =C2=A0=3D 120 GB =C2=A0(entire dataset fits in shared buffers)
= =C2=A0 maintenance_work_mem =C2=A0=3D 1 GB
=C2=A0 max_wal_size =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 100 GB =C2=A0(prevents checkpoints during va= cuum)
=C2=A0 min_wal_size =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 10 GB=C2=A0 checkpoint_timeout =C2=A0 =C2=A0=3D 1 hour =C2=A0(prevents time-bas= ed checkpoints)
=C2=A0 wal_buffers =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =3D 128 MB
=C2=A0 max_parallel_workers =C2=A0=3D 16
=C2=A0 max_worker= _processes =C2=A0=3D 24
=C2=A0 autovacuum_naptime =C2=A0 =C2=A0=3D 1s
Between every single run:
=C2=A0 1. PostgreSQL server is fully stop= ped (pg_ctl stop -m fast)
=C2=A0 2. OS page cache is dropped (echo 3 >= ; /proc/sys/vm/drop_caches)
=C2=A0 3. Server is restarted with a clean l= og file
=C2=A0 4. After DELETE and CHECKPOINT, the server is stopped aga= in, OS caches
=C2=A0 =C2=A0 =C2=A0dropped again, and the server restarte= d -- so vacuum starts fully cold
=C2=A0 5. The autovacuum_max_parallel_w= orkers GUC is reloaded via pg_ctl reload

Each configuration was test= ed for 5 iterations.

Timing is extracted from the PostgreSQL server = log "system usage" line that
autovacuum emits at completion. T= his reports elapsed wall-clock time and CPU
time for the autovacuum work= er leader process.


Results: Experiment A (cost_limit=3D200, cost= _delay=3D2ms)
------------------------------------------------------
=
Workers =C2=A0Iter1 =C2=A0 =C2=A0Iter2 =C2=A0 =C2=A0Iter3 =C2=A0 =C2=A0= Iter4 =C2=A0 =C2=A0Iter5 =C2=A0 =C2=A0Avg(s) =C2=A0Speedup
------- =C2= =A0------ =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0 --= ---- =C2=A0-------
0 =C2=A0 =C2=A0 =C2=A0 =C2=A066.21 =C2=A0 =C2=A079.11= =C2=A0 =C2=A066.27 =C2=A0 =C2=A077.11 =C2=A0 =C2=A066.30 =C2=A0 =C2=A071.0= 0 =C2=A0 1.00x
2 =C2=A0 =C2=A0 =C2=A0 =C2=A066.55 =C2=A0 =C2=A053.27 =C2= =A0 =C2=A052.66 =C2=A0 =C2=A055.74 =C2=A0 =C2=A055.71 =C2=A0 =C2=A056.78 = =C2=A0 1.25x
4 =C2=A0 =C2=A0 =C2=A0 =C2=A051.50 =C2=A0 =C2=A051.74 =C2= =A0 =C2=A065.07 =C2=A0 =C2=A052.06 =C2=A0 =C2=A070.25 =C2=A0 =C2=A058.12 = =C2=A0 1.22x
7 =C2=A0 =C2=A0 =C2=A0 =C2=A050.05 =C2=A0 =C2=A050.35 =C2= =A0 =C2=A050.04 =C2=A0 =C2=A050.12 =C2=A0 =C2=A050.07 =C2=A0 =C2=A050.12 = =C2=A0 1.41x

CPU usage (leader process only):
Workers =C2=A0Avg C= PU user =C2=A0Avg CPU sys
------- =C2=A0----------- =C2=A0 ----------0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.04s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.70s
2 = =C2=A0 =C2=A0 =C2=A0 =C2=A01.24s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.50s
4 =C2= =A0 =C2=A0 =C2=A0 =C2=A00.78s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.49s
7 =C2=A0= =C2=A0 =C2=A0 =C2=A00.79s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.48s


Res= ults: Experiment B (cost_limit=3D60, cost_delay=3D2ms)
-----------------= ------------------------------------

Workers =C2=A0Iter1 =C2=A0 =C2= =A0Iter2 =C2=A0 =C2=A0Iter3 =C2=A0 =C2=A0Iter4 =C2=A0 =C2=A0Iter5 =C2=A0 = =C2=A0Avg(s) =C2=A0Speedup
------- =C2=A0------ =C2=A0 ------ =C2=A0 ---= --- =C2=A0 ------ =C2=A0 ------ =C2=A0 ------ =C2=A0-------
0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0199.00 =C2=A0 195.26 =C2=A0 191.44 =C2=A0 191.90 =C2=A0 19= 1.67 =C2=A0 193.85 =C2=A01.00x
2 =C2=A0 =C2=A0 =C2=A0 =C2=A0160.68 =C2= =A0 181.33 =C2=A0 176.85 =C2=A0 167.84 =C2=A0 159.47 =C2=A0 169.23 =C2=A01.= 14x
4 =C2=A0 =C2=A0 =C2=A0 =C2=A0154.02 =C2=A0 165.02 =C2=A0 174.33 =C2= =A0 164.16 =C2=A0 156.53 =C2=A0 162.81 =C2=A01.19x
7 =C2=A0 =C2=A0 =C2= =A0 =C2=A0148.49 =C2=A0 158.68 =C2=A0 160.66 =C2=A0 154.37 =C2=A0 149.20 = =C2=A0 154.28 =C2=A01.25x

CPU usage (leader process only):
Worker= s =C2=A0Avg CPU user =C2=A0Avg CPU sys
------- =C2=A0----------- =C2=A0 = ----------
0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.06s =C2=A0 =C2=A0 =C2=A0 =C2= =A0 1.90s
2 =C2=A0 =C2=A0 =C2=A0 =C2=A01.28s =C2=A0 =C2=A0 =C2=A0 =C2=A0= 1.72s
4 =C2=A0 =C2=A0 =C2=A0 =C2=A00.80s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.= 69s
7 =C2=A0 =C2=A0 =C2=A0 =C2=A00.82s =C2=A0 =C2=A0 =C2=A0 =C2=A0 1.68s=



Observations:

1. Parallel autovacuum provides= consistent speedup. With cost_limit=3D200 and
=C2=A0 =C2=A07 workers, v= acuum completes 1.41x faster (71s -> 50s). With cost_limit=3D60,
=C2= =A0 =C2=A0the speedup is 1.25x (194s -> 154s).
2. I see the benefit c= omes from parallelizing index vacuum. With 8 indexes totaling
=C2=A0 =C2= =A0~530 MB, parallel workers scan indexes concurrently instead of the leade= r
=C2=A0 =C2=A0scanning them one by one. The leader's CPU user time = drops from ~3s to
=C2=A0 =C2=A0~0.8s as index work is offloaded





Thanks,
Satya



--000000000000f01fd9064e32c6e7--