Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8RVW-000XIx-1e for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 23:31:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8RVU-008jXG-39 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 23:31:25 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8RVU-008jX7-1O for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 23:31:25 +0000 Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8RVS-00000000GP8-1Ig7 for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 23:31:23 +0000 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-2b25cf1b5f0so8575485ad.3 for ; Thu, 02 Apr 2026 16:31:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775172681; cv=none; d=google.com; s=arc-20240605; b=Za0iP2PGhSPjVLz18fkjIjmGIgz/4v1jM57gRY1zPv6S2z9Dl+NhfE6uWBH9BkzXst yG9p0nhRIT+d8VIkj+xj/bnF2zrcLEnqUnyYo5lEuYWzwH+6zhUBFnJC9TbGllji5GfD NrOZlXgVPCYyUa6+yKsn225ieoK4ko8NhxjoYghr8Jx6LnOcAH3kklDMl8j3j02gmfkG noSrpk91FNQNTTk2e8PSh3qYgV5m7jC6Du7niZPshZy86MspMWtwVRSH0dSLgcCvBkJ8 isLkA7trUg0Q5fViNqqTJ56v7lFKMi53Efzll9WYfQE4lpAJZ+i3pQd3bkyYO3yrWMCV KZFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=VlC/N3APT6QD/wQvTtB8/in/onMruK5tGBl/x+h6Q7I=; fh=6PMGaWVwLlX0ATkJDohopBzK1ZgBhF8PDOIBc3O/mh8=; b=KjxdUOgYGg/FoNhqZd0hObH/HSuG51GSt9I58cij/z2U9ry7CdZ+kURJf9rP1fp4sy EQLMczYzhTSsBNdVOGlUMZMCkuJ6URkw5h5jF6ihQt/2vdSQuHMiOqefE9AP2x90V29S Z2C8au5uEWoTFAs3jnY/o4nfbdH+PI2/ph7m2t6zA2+2LKbGUcmjNoPemeg1k0pa/o2e 4TOY9n+bQzGskZ3LG2mUHRfWE+kFd0znlE88qCXZdjAaWC7CSPrvSyjumh5KMxGsq9BE dPZmkZGEGcVa70ki8S5MYYYU1hzqrPvhmXJVb2xl1PlKZE2X+S4szJJL+b1ptgYkBBVa tMsg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775172681; x=1775777481; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VlC/N3APT6QD/wQvTtB8/in/onMruK5tGBl/x+h6Q7I=; b=mh1b5QOUCCSViUTnJLIjJfrnPAZKsGjt6l4OGQt/BlwRy/M9Yd8/0U5oTcyS59aCpH wc1ARqDmCbO+fVov2VycoSkqwc1Ae9N33T5kNVc23FHb8yCnesGwzG3iqIL26nwk9neM k/cr/IfgKF+2xF0+1DZpQMEyIcG4s+dEc+JkcuNQeg9ipGZQ3SSVCGSO5aNF+N2w4zSh pqXXg344l7MnXcLhh6vMNKA2A3ts8uCPiErPJPJ969HRRBcYKyE0M/Daz54yRaaB4PDc zD9MSj9aHxV9YwrOgk/ZUzRq49y7PEVp9EfxdMIMpFXfGqRgq5FWL5S5xD8yJcMx+qZ0 pDaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775172681; x=1775777481; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VlC/N3APT6QD/wQvTtB8/in/onMruK5tGBl/x+h6Q7I=; b=FP9co3V20oTEaIYC5mhh8mJpPWIp4UTWwQPLbT2YmfnarR0OHiAirMu2smYhnROHBh fNeSjiWb7SDzLlevvNMeozqQFQbwGvL29rm3vcnYMlCTFVNTd+8ie2wlLnd/oQXzToxR JYrp5ifRHagDQYE8w9KvgXL1qepDMLllbeDLpMqREzOL+2d2xnOB161U1z0aPxlnQMBg wUhEiGyMa+RRZVHKvfBJy5mdNSN14Kfx7Mw2vdcQGKLfwOK/PM7AMmWj+loaZRmAEkDs uG2qv0jIMUV5hFgHpgickQ6DBbc6lBG16uPjrx5s+cLZdWg05J9qYHkz2WeDKAmcZswu I0Xw== X-Forwarded-Encrypted: i=1; AJvYcCUUcWaPG9gSPSULvDVxyJsx/28OoCz7IVCYRY41Q0SvuDLZ2tmz3mwpCJYMbry18PopHpHW+ryn3ddpwfIs@lists.postgresql.org X-Gm-Message-State: AOJu0YyeRXPEcDjBN4ArttRPsjIdg4IR1nFNrpt1weZAzXMvtBseQ+qu YgxfNMPPzDcHvQy3T6TYwQiBf82UG/OcS8ifoySliDckx8thC7Wv/T2Zp/lGb55qqOFClFTct/Z jFRDIASfmd99TAIUIH4FJ+jBhHkoR1NM= X-Gm-Gg: AeBDieuOEPrsAewK5dQBwh/Oxbb7y3OVdetpKMUCDneA5Bu9Rhi3179nAk5BDlJ6Rwq /jwaKflMMp/YKnSoBcVkpJ4XDkcSrSOIJ8xQDNnk66yvbiB9CrG84YQVhXAfCO78n2yLdAZbOeM H7mRzceInjY/46ZN6OA5+K+xDj+H98AyfrmPURZHazM5GN1OJ6xX9xjB6xOo6r/g5N3N8EZR0oe bwRPGrLqbpQSYL/NTeMLmCngl2iVquf/hstok1o6VMTBeWvKGxzDpr56nQIfyfWBqRkinXN237a pRbPfOGPrbTJ09oOv8k= X-Received: by 2002:a17:903:18a:b0:2b0:9c2b:641d with SMTP id d9443c01a7336-2b28164c170mr11026975ad.2.1775172681184; Thu, 02 Apr 2026 16:31:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Thu, 2 Apr 2026 16:30:42 -0700 X-Gm-Features: AQROBzCRAlzLwp7X57uHcx-JpG3TjgEI6xBu9-RpRIOVzeFzOycw717ihWfoH3Y Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Alexander Korotkov Cc: SATYANARAYANA NARLAPURAM , Daniil Davydov <3danissimo@gmail.com>, Bharath Rupireddy , Sami Imseih , Matheus Alcantara , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Apr 2, 2026 at 4:02=E2=80=AFAM Alexander Korotkov wrote: > > Hi! > > On Wed, Apr 1, 2026 at 9:55=E2=80=AFPM Masahiko Sawada wrote: > > > > On Mon, Mar 30, 2026 at 5:14=E2=80=AFPM SATYANARAYANA NARLAPURAM > > wrote: > > > > > > Hi > > > > > > On Mon, Mar 30, 2026 at 1:44=E2=80=AFAM Daniil Davydov <3danissimo@gm= ail.com> wrote: > > >> > > >> Hi, > > >> > > >> On Mon, Mar 30, 2026 at 7:17=E2=80=AFAM SATYANARAYANA NARLAPURAM > > >> wrote: > > >> > > > >> > Thank you for working on this, very useful feature. Sharing a few = thoughts: > > >> > > > >> > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting = DSM resources in parallel_vacuum_compute_workers? > > >> > > >> Actually, autovacuum_max_parallel_workers is already limited by > > >> max_parallel_workers. It is not clear for me why we allow setting th= is GUC > > >> higher than max_parallel_workers, but if this happens, I think it is= a user's > > >> misconfiguration. > > >> > > >> > 2. Is it intentional that other autovacuum workers not yield cost = limits to the parallel auto vacuum workers? Cost limits are distributed fir= st equally to the autovacuum workers. > > >> > and then they share that. Therefore, parallel workers will be heav= ily throttled. IIUC, this problem doesn't exist with manual vacuum. > > >> > If we don't fix this, at least we should document this. > > >> > > >> Parallel a/v workers inherit cost based parameters (including the > > >> vacuum_cost_limit) from the leader worker. Do you mean that this can= be too > > >> low value for parallel operation? If so, user can manually increase = the > > >> vacuum_cost_limit reloption for those tables, where parallel a/v sle= eps too > > >> much (due to cost delay). > > >> > > >> BTW, describing the cost limit propagation to the parallel a/v worke= rs is > > >> worth mentioning in the documentation. I'll add it in the next patch= version. > > >> > > >> > 3. Additionally, is there a point where, based on the cost limits,= launching additional workers becomes counterproductive compared to running= fewer workers and preventing it? > > >> > > >> I don't think that we can possibly find a universal limit that will = be > > >> appropriate for all possible configurations. By now we are using a p= retty > > >> simple formula for parallel degree calculation. Since user have seve= ral ways > > >> to affect this formula, I guess that there will be no problems with = it (except > > >> my concerns about opt-out style). > > >> > > >> > 4. Would it make sense to add a table level override to disable pa= rallelism or set parallel worker count? > > >> > > >> We already have the "autovacuum_parallel_workers" reloption that is = used as > > >> an additional limit for the number of parallel workers. In particula= r, this > > >> reloption can be used to disable parallelism at all. > > >> > > >> > > > >> > I ran some perf tests to show the improvements with parallel vacuu= m and shared below. > > >> > > >> Thank you very much! > > >> > > >> > Observations: > > >> > > > >> > 1. Parallel autovacuum provides consistent speedup. With cost_limi= t=3D200 and > > >> > 7 workers, vacuum completes 1.41x faster (71s -> 50s). With cos= t_limit=3D60, > > >> > the speedup is 1.25x (194s -> 154s). > > >> > 2. I see the benefit comes from parallelizing index vacuum. With 8= indexes totaling > > >> > ~530 MB, parallel workers scan indexes concurrently instead of = the leader > > >> > scanning them one by one. The leader's CPU user time drops from= ~3s to > > >> > ~0.8s as index work is offloaded > > >> > > > >> > > >> 1.41 speedup with 7 parallel workers may not seem like a great win, = but it is > > >> a whole time of autovacuum operation (not only index bulkdel/cleanup= ) with > > >> pretty small indexes. > > >> > > >> May I ask you to run the same test with a higher table's size (sever= al dozen > > >> gigabytes)? I think the results will be more "expressive". > > > > > > > > > I ran it with a Billion rows in a table with 8 indexes. The improveme= nt with 7 workers is 1.8x. > > > Please note that there is a fixed overhead in other vacuum steps, for= example heap scan. > > > In the environments where cost-based delay is used (the default), ben= efits will be modest > > > unless vacuum_cost_delay is set to sufficiently large value. > > > > > > Hardware: > > > CPU: Intel Xeon Platinum 8573C, 1 socket =C3=97 8 cores =C3=97 = 2 threads =3D 16 vCPUs > > > RAM: 128 GB (131,900 MB) > > > Swap: None > > > > > > Workload Description > > > > > > Table Schema: > > > CREATE TABLE avtest ( > > > id bigint PRIMARY KEY, > > > col1 int, -- random()*1e9 > > > col2 int, -- random()*1e9 > > > col3 int, -- random()*1e9 > > > col4 int, -- random()*1e9 > > > col5 int, -- random()*1e9 > > > col6 text, -- 'text_' || random()*1e6 (short text= ~10 chars) > > > col7 timestamp, -- now() - random()*365 days > > > padding text -- repeat('x', 50) > > > ) WITH (fillfactor =3D 90); > > > > > > Indexes (8 total): > > > avtest_pkey =E2=80=94 btree on (id) bigint > > > idx_av_col1 =E2=80=94 btree on (col1) int > > > idx_av_col2 =E2=80=94 btree on (col2) int > > > idx_av_col3 =E2=80=94 btree on (col3) int > > > idx_av_col4 =E2=80=94 btree on (col4) int > > > idx_av_col5 =E2=80=94 btree on (col5) int > > > idx_av_col6 =E2=80=94 btree on (col6) text > > > idx_av_col7 =E2=80=94 btree on (col7) timestamp > > > > > > Dead Tuple Generation: > > > DELETE FROM avtest WHERE id % 5 IN (1, 2); > > > This deletes exactly 40% of rows, uniformly distributed across all = pages. > > > > > > Vacuum Trigger: > > > Autovacuum is triggered naturally by lowering the threshold to 0 an= d setting > > > scale_factor to a value that causes immediate launch after the DELE= TE. > > > > > > Worker Configurations Tested: > > > 0 workers =E2=80=94 leader-only vacuum (baseline, no parallelism) > > > 2 workers =E2=80=94 leader + 2 parallel workers (3 processes total= ) > > > 4 workers =E2=80=94 leader + 4 parallel workers (5 processes total= ) > > > 7 workers =E2=80=94 leader + 7 parallel workers (8 processes total= , 1 per index) > > > > > > Dataset: > > > Rows: 1,000,000,000 > > > Heap size: 139 GB > > > Total size: 279 GB (heap + 8 indexes) > > > Dead tuples: 400,000,000 (40%) > > > > > > Index Sizes: > > > avtest_pkey 21 GB (bigint) > > > idx_av_col7 21 GB (timestamp) > > > idx_av_col1 18 GB (int) > > > idx_av_col2 18 GB (int) > > > idx_av_col3 18 GB (int) > > > idx_av_col4 18 GB (int) > > > idx_av_col5 18 GB (int) > > > idx_av_col6 7 GB (text =E2=80=94 shorter keys, smaller index) > > > Total indexes: 139 GB > > > > > > Server Settings: > > > shared_buffers =3D 96GB > > > maintenance_work_mem =3D 1GB > > > max_wal_size =3D 100GB > > > checkpoint_timeout =3D 1h > > > autovacuum_vacuum_cost_delay =3D 0ms (NO throttling) > > > autovacuum_vacuum_cost_limit =3D 1000 > > > > > > > > > Summary: > > > > > > Workers Avg(s) Min(s) Max(s) Speedup Time Saved > > > ------- ------ ------ ------ ------- ---------- > > > 0 1645.93 1645.01 1646.84 1.00x =E2=80=94 > > > 2 1276.35 1275.64 1277.05 1.29x 369.58s (6.2 min) > > > 4 1052.62 1048.92 1056.32 1.56x 593.31s (9.9 min) > > > 7 892.23 886.59 897.86 1.84x 753.70s (12.6 min) > > > > > > > Thank you for sharing the performance test results! > > > > While the benchmark results look good to me, have you compared the > > performance differences between parallel vacuum in the VACUUM command > > (with the PARALLEL option) and parallel vacuum in autovacuum? Since > > parallel autovacuum introduces some logic to check for delay parameter > > updates, I thought it was worth verifying if this adds any overhead. > > > > BTW, in my view, the most challenging part of this patch is the > > propagation logic for vacuum delay parameters. This propagation is > > necessary because, unlike manual VACUUM, autovacuum workers can reload > > their configuration during operation. We must ensure that parallel > > workers stay synchronized with these updated parameters. > > > > The current patch implements this in vacuumparallel.c: the leader > > shares delay parameters in DSM and updates them (if any vacuum delay > > parameters are updated) after a config reload, while workers poll for > > updates at every vacuum_delay_point() call to refresh their local > > variables. > > > > Another possible approach would be an event-driven model where the > > leader notifies workers after updating shared parameters=E2=80=94for ex= ample, > > by adding a shm_mq between the leader (as the sender) and each worker > > (as the receiver). > > > > I've compared these two ideas and opted for the former (polling). > > While a polling approach could theoretically be costly, the current > > implementation is self-contained within the parallel vacuum logic and > > does not touch the core parallel query infrastructure. The > > notification approach might look more elegant, but I'm concerned it > > adds unnecessary complexity just for the autovacuum case. Since the > > polling is essentially just checking an atomic variable, the overhead > > should be negligible. > > > > To verify this, I conducted benchmarks comparing the whole execution > > time and index vacuuming duration. > > > > Setup: > > > > - Disabled (auto) vacuum delays and buffer usage limits. > > - Parallel autovacuum with 1 worker on a table with 2 indexes (approx. > > 4 GB each). > > - 5 runs. > > > > Case 1: The latest patch (with polling) > > > > Average: 3.95s (Index: 1.54s) > > Median: 3.62s (Index: 1.37s) > > > > Case 2: The latest patch without polling > > > > Average: 3.98s (Index: 1.56s) > > Median: 3.70s (Index: 1.40s) > > > > Note that in order to simulate the code that doesn't have the polling, > > I reverted the following change: > > > > - if (InterruptPending || > > - (!VacuumCostActive && !ConfigReloadPending)) > > + if (InterruptPending) > > + return; > > + > > + if (IsParallelWorker()) > > + { > > + /* > > + * Update cost-based vacuum delay parameters for a parallel aut= ovacuum > > + * worker if any changes are detected. > > + */ > > + parallel_vacuum_update_shared_delay_params(); > > + } > > + > > + if (!VacuumCostActive && !ConfigReloadPending) > > > > The parallel vacuum workers don't check the shared vacuum delay > > parameter at all, which is still fine as I disabled vacuum delays. > > > > Overall, the results show no noticeable overhead from the polling appro= ach. > > I would say this polling approach is very cheap. When there are no > updates, it only has to check a single 32-bit value from shared > memory. And that value doesn't get updated frequently; it's good for > caching. No wonder we see no measurable overhead. Thank you for the comments! > > Regarding the event-driven approach, given that the parallel worker > process is busy with other jobs (doing actual vacuuming), it would > anyway have to poll for new events from time to time. Thus, I don't > think it's possible to organize polling for new events any cheaper > than the current approach of polling for updates in shmem. What do you think about the idea of using proc signals like the patch I've sent recently[1]? With that approach, workers have to check the local variable. It seems slightly cheaper and can use the existing logic. [1] https://www.postgresql.org/message-id/CAD21AoBm0cxQjtWuY0f7%2BaT4UiRV%2= B%2BaFKkzjj6vmERTj_UFnxA%40mail.gmail.com Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com