Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvOnJ-00HUnF-09 for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 23:59:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvOnH-0091Mq-2e for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 23:59:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvOnH-0091Mh-0f for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 23:59:51 +0000 Received: from mail-lf1-x12b.google.com ([2a00:1450:4864:20::12b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvOnD-00000001CO6-2EUp for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 23:59:50 +0000 Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-5a1046f62d0so295964e87.1 for ; Wed, 25 Feb 2026 15:59:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772063985; cv=none; d=google.com; s=arc-20240605; b=e6wvOCs4tywj4XV9lwGh6sAMHE26OFbta8a74a8Z8bjJQF6HjQnn2J6fMNI15kGaOU LQZ1KxQ2Ko3Qx0cCLL4IhhFVQMucRCTIO8Mo+0NshP2CdltreI/Hdge3fZf77fH+vqBt RbqNMyEigANscU1d1qjt38D4Fwti/P358Z1z/SiJ0YROgbB0UbuKtKVmVTl2+BN8PTfq uv0DuZxTbZISDPAExPordj5Z0Ghp0/qRBKeTmpVVjWP7Q3juYyBTA6RQVSsbbSxipBIo vEa6RTXNbn04zRQD8UUBrBN0QcDBKyEuiZOfP111PFA1OEQvg7Nu5OshyU+IEd2y3YNl 3TvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=B9tBlTzKmlwzxRTL4O2cIA4K54/eienrSIKi0IAVFy4=; fh=gY59haoxNSMkeZz1kVVJjpNejAK7PAaODuubSDaUeHc=; b=FSLPLy4fr5tbUmoOaXim3PCvM4wiahwJMElt7PuDbQB/H0MHaTDbbbxJ/4J8KF83Zu 6flIYoQQHCXTN2xaruCS1e5w7ydEGu9AUtI6z9td1+VV0ahxJVU8keSUrLUtAlNNQkKy AKhwSuNZ3RQkrIEt9sSf52OQZfUGndHg0OS63gK9361OlHD4deHzqUO891mnhpWGN1vB wwnAYzY84Ji0zOKv3jGADUo19vprrQCEwSm+mG9S6mXwDTswK+ID2PWMYdRqerHh0K4g eCOr+6C7XNN+ug9KmPRCw6NL5cmpXRGnBd6sG9elN7p9qKQVV7Hem4uKmsoCF3GNr7z3 yijg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772063985; x=1772668785; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=B9tBlTzKmlwzxRTL4O2cIA4K54/eienrSIKi0IAVFy4=; b=Zcq2uNny22B1kXBJ5+aW4aWRRwIvzzAM25OB8teDKgghbZHxvsg3n2gxSrCjsxuAmN 7ejUOH1o+h+KEKxXmaiUntg555kcYUS+hRuKz5pSw/OexdA3qCqwagqI9JdW8kojlg3v VBA+u7dCNtFL5TZA5qJpoOHJLFk5moEFRd9Z4CFuGGBV7JwvbsoTIW9tezqlk9klQ/pQ joZ+KwfuDtRAbnJ0JgicuHGnme9/gn1xroaIbRFiNw39nJxVOOpXOIfdKVgM1bCuBu+A y2vWuq2mu0zS0M26JFn2oFgT+oZKwDlETwbF5w6WleiYMeV+xtjewfznY/b55INcgOr5 KeyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772063985; x=1772668785; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=B9tBlTzKmlwzxRTL4O2cIA4K54/eienrSIKi0IAVFy4=; b=CRhEOcZupouBYi4MMbsZTfyd6cCgg/3uhZJJapGmEjwdVjJnr23tV0xFQEr2Mvyrw0 WBTSS4bhPPFm/xAoQvHM2iuKcg+OsNobprofflAh+X4fbyw84EmXxV5JqJgsqGy1Cags cXoeQP8rYBC3Crs6N0hu0Pzg6Ul5ETjBTtcWc9ReyDKMAEUtX17YFdkmbRDZwP0BZTPT 1Auy1tTM22SK4cAnPbT3g3JzdUE4pYzk04PPTWd3NxCdw+etp0aDlRpWGNeXTSWNmPFR SPmnhcDoht31IBM05ihjLokpSnt6nVX2FIiZ3r4RERBJTTmIj/K2eEsEdYDAK68fnSib 8Nbw== X-Forwarded-Encrypted: i=1; AJvYcCWrZaiJ5KCdrrEg4iPh/W3iOMG27+Q678+oPlEBL+6gfkVuc+Hbli8jgZfQ+g8UURt6rX97X/aLSx3o8iWB@lists.postgresql.org X-Gm-Message-State: AOJu0YybnFT/oKb06nMkaBTf4qzd30+ivUF/4WRsKNp7Xbk4tEwUxyQM jgijjZ6NXEH4xhivkj6SkejMhngKzCYFVi6J49OfkqIb0O2FsDiYxPWz+P1LVoI8hj7pI2Tqte7 hpwX3DrMVdQSy3ix/QwCd9NKzGq1urPc= X-Gm-Gg: ATEYQzxNIBhfTBb5+H9M1t1VrBTEKMBOyvmyWxsEo7gJUShdIHGVFUhPx8IVA83G8a2 xo637zFGMD8B3hMxekB1MwBWHkihMjxZpAA/S9cuaFX1BU5TW50g2X9FzlS5DDbNYx/Hx83nvtO 5/l6ycyf4vKLvCOEKVOxsTG38umABaZOt+rX19tmObS1SdoSRdvQMRIUt0o4pzPpfC8zmLokkU8 fOhKR4FwUEph9iLEJByTb44LVvQ+zz0Ry/dWEtFN7klVNzVBk9HLILfKmfA41OrQXxCObATv1ZK GGoYewwo X-Received: by 2002:a05:6512:23a4:b0:59e:62c5:94a7 with SMTP id 2adb3069b0e04-5a109a7cd64mr124533e87.12.1772063985034; Wed, 25 Feb 2026 15:59:45 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Wed, 25 Feb 2026 15:59:05 -0800 X-Gm-Features: AaiRm51HRQIKp-5yn3On2l7dDwrxlL8_45jvvBucF14O4_6CKHnqhykWYpJWu0g Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Daniil Davydov <3danissimo@gmail.com> Cc: Sami Imseih , Alexander Korotkov , Matheus Alcantara , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Feb 11, 2026 at 12:04=E2=80=AFAM Daniil Davydov <3danissimo@gmail.c= om> wrote: > > On Thu, Jan 22, 2026 at 5:29=E2=80=AFAM Masahiko Sawada wrote: > > > > On Sat, Jan 17, 2026 at 6:52=E2=80=AFAM Daniil Davydov <3danissimo@gmai= l.com> wrote: > > > > > > I will keep the 'max_worker_processes' limit, so autovacuum will not > > > waste time initializing a parallel context if there is no chance that > > > the request will succeed. > > > But it's worth remembering that actually the > > > 'autovacuum_max_parallel_workers' parameter will always be implicitly > > > capped by 'max_parallel_workers'. > > > > It doesn't make sense to me that we limit > > autovacuum_max_parallel_workers by max_worker_processes TBH. When > > users want to have more parallel vacuum workers for autovacuum and the > > VACUUM command, they would have to consider max_worker_processes, > > max_parallel_workers, and max_parallel_maintenance_workers separately. > > Given that max_parallel_workers is controlling the number of > > max_worker_processes that can be used in parallel operations, I > > believe that parallel vacuum workers for autovacuum should also be > > taken from that pool. > > Maybe I don't quite understand the meaning of "limited by". For example, > we have a max_parallel_workers_per_gather parameter, which is limited > by max_parallel_workers. But actually we can set this parameter to a valu= e > that is higher than max_parallel_workers. The limitation is that for Gath= er > node we cannot request more workers than are available in bgworkers pool > (where number of free workers is always <=3D max_parallel_workers). Thus, > limitation actually exists only for bgworkers pool, on which other parall= el > operations depend. In particular, whatever values we set for the > autovacuum_max_parallel_workers parameter, it always will depend only > on bgworkers pool. Right, parallel workers are actually taken from bgworkers pool. > > I'll give in to your opinion and add a limitation by max_parallel_workers= . > But I still don't understand where the point is in explicit limitation by > max_parallel_workers, if we already have this limitation implicitly? > It seems a bit redundant for me. I hope I've conveyed my point correctly. max_worker_processes controls the number of available bgworkers in the database cluster and bg workers are used for parallel queries, logical replication, or any other extensions as well. Also, it requires a server restart to change. max_parallel_workers controls "how many bgworkers can be used for parallel queries in total?" and is a PGC_USERSET parameter. I think it's easier for users to tune parallel query related parameters since all bgworkers for parallel queries (i.e., parallel workers) are taken from max_parallel_workers pool. For example, if users want to disable all parallel queries, they can do that by setting max_parallel_workers to 0. If parallel vacuum workers for autovacuums are taken from max_worker_processes pool (i.e., without max_paralle_workers limit), users would need to set both max_parallel_workers and autovacuum_max_parallel_workers to 0. > -- > > > + /* > > + * If 'true' then we are running parallel autovacuum. Otherwise, we= are > > + * running parallel maintenence VACUUM. > > + */ > > + bool am_parallel_autovacuum; > > > > How about renaming it to use_shared_delay_params? I think it conveys > > better what the field is used for. > > I think that we should leave this name, because in the future some other > behavior differences may occur between manual VACUUM and autovacuum. > If so, we will already have an "am_autovacuum" field which we can use in > the code. > The existing logic with the "am_autovacuum" name is also LGTM - we should > use shared delay params only because we are running parallel autovacuum. It may occur but we can change the field name when it really comes. I'm slightly concerned that we've been using am_xxx variables in a different way. For instance, am_walsender is a global variable that is set to true only in wal sender processes. Also we have a bunch of AmXXProcess() macros that checks the global variable MyBackendType, to check the kinds of the current process. That is, the subject of 'am' is typically the process, I guess. On the other hand, am_parallel_autovacuum is stored in DSM space and indicates whether a parallel vacuum is invoked by manual VACUUM or autovacuum. > > > Truncating all logs every after test would decrease the debuggability > > much. We can pass the offset as the start point to wait for the > > contents. > > > > I've combined two of your above comments purposely. I agree that truncati= ng > all logs is a bad decision and we need to solve this in a different way. = But the > problem will occur If we want to 1) use logging instead of a test-only fu= nction > and 2) use offsets as the start point to wait for the contents in the log= file. > > Imagine that we (using the described approach) need to wait until the end= of > parallel index processing and determine the current number of free parall= el > workers. > > IIUC, It'll look like this : > wait_for_av_log("autovacuum processing finished"); > wait_for_av_log("number of free workers =3D N"); > > But when we call wait_for_av_log first time, we will advance "offset" to = the > end of logfile and thus we will miss the "number of free workers =3D N". = The > only way to avoid it is to write a function that will determine the exact > position of "autovacuum processing finished" in the logfile. Isn't it too= much? > > I think that using wait_for_av_log("autovacuum processing finished"); + > SELECT get_parallel_autovacuum_free_workers(); will be much more > demonstrably and simply. > > Moreover, the AutoVacuumGetFreeParallelWorkers function doesn't > seem completely useless in isolation from tests. I suggest leaving > this function and its usage in the tests. I can remove the "For testing > purpose only!" comment, so everyone will be free to use this function > in the future. Agreed. The updated test scenario looks reasonable to me. > > 1) > Test 5 can be stabilized as follows : > We can attach to the "autovacuum-start-parallel-vacuum" injection point i= n > the "wait" mode. Thereby when we terminate the first a/v leader, we are > guaranteed that no other a/v leader will reach release/reserve functions. > And then we are free to call the get_parallel_autovacuum_free_workers > function. I'll additionally describe this logic in the test. > > 2) > In the test 4 I found another problem : when a/v leader errors out, it wi= ll > exit() pretty soon. And during exit() it will call the before_shmem_exit = hook. > Thus, we cannot be sure that parallel workers has been released exactly > in the try/catch block. In order to guarantee it, I think that we should = log > something inside the try/catch block. I added a pretty controversial logg= in > code for it, but it is the best I came up with. > > In the test 4 the above idea will look something like this: > $log_start =3D $node->wait_for_log( > qr/error triggered for injection point / . > qr/autovacuum-leader-before-indexes-processing/, > $log_start > ); > $log_start =3D $node->wait_for_log( > qr/2 parallel autovacuum workers has been released after occured erro= r/, > $log_start > ); > > Above I described a problem that may occur when we advance > "logfile offset" too far after the first wait_for_log call. Here, this pr= oblem > doesn't occur because the autovacuum launcher infinitely tries to > vacuum the table, so other "N workers released" messages occur. If we write the log "%d parallel autovacuum workers have been released" in AutoVacuumReleaseParallelWorkres(), can we simplify both tests (4 and 5) further? I've reviewed all patches. The 0001 patch looks good to me. 0002 patch: + /* Worker usage stats for parallel autovacuum. */ + appendStringInfo(&buf, + _("parallel index vacuum: %d workers were planned, %d workers were reserved and %d workers were launched in total\n"), + vacrel->workers_usage.vacuum.nplanned, + vacrel->workers_usage.vacuum.nreserved, + vacrel->workers_usage.vacuum.nlaunched); These log messages need to take care of plural forms but it seems to be too long if we use errmsg_plural() for each number. So how about something like: parallel workers: index: %d planned, %d reserved, %d launched in total parallel workers: cleanup %d planned, %d reserved, %d launched (Index cleanup is executed at most once so we don't need "in total" in the message.) 0003 patch: +typedef struct CostParamsData +{ + double cost_delay; + int cost_limit; + int cost_page_dirty; + int cost_page_hit; + int cost_page_miss; +} CostParamsData; The name CostParamsData sounds too generic and I guess it could conflict with optimizer-related struct names in the future. How about renaming it to VacuumDelayParams? --- + SpinLockAcquire(&pv_shared_cost_params->mutex); + + shared_params_data =3D pv_shared_cost_params->params_data; + + VacuumCostDelay =3D shared_params_data.cost_delay; + VacuumCostLimit =3D shared_params_data.cost_limit; + VacuumCostPageDirty =3D shared_params_data.cost_page_dirty; + VacuumCostPageHit =3D shared_params_data.cost_page_hit; + VacuumCostPageMiss =3D shared_params_data.cost_page_miss; + + SpinLockRelease(&pv_shared_cost_params->mutex); If we copy the shared values in pv_shared_cost_params, we should release the spinlock earlier, i.e., before updating VacuumCostXXX variables. But I don't think we would even need to set these values in the local variables in this case as updating 4 local variables is fairly cheap. --- + FillCostParamsData(&local_params_data); + SpinLockAcquire(&pv_shared_cost_params->mutex); + + if (CostParamsDataEqual(pv_shared_cost_params->params_data, + local_params_data)= ) + { IIUC it stores cost-based vacuum delay parameters into the local_params_data only for using CostParamsDataEqual() macro. I think it's better to directly compare values in pv_shared_cost_params and the cost-based vacuum delay parameters. 0004 patch: + if (nworkers > 0) + INJECTION_POINT("autovacuum-leader-before-indexes-processing", NULL); I think it's better to use #ifdef USE_INJECTION_POINTS here. --- +#ifdef USE_INJECTION_POINTS +/* + * Log values of the related to cost-based delay parameters. It is used fo= r s/values of the related to/values related to/ --- + * testing purpose. + */ +static void +parallel_vacuum_report_cost_based_params(void) +{ + StringInfoData buf; + + /* Simulate config reload during normal processing */ + pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1); + vacuum_delay_point(false); + pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1); Calling vacuum_delay_point() here feels a bit arbitrary to me. Since parallel vacuum workers are calling parallel_vacuum_report_cost_based_params() after parallel_vacuum_process_safe_indexes(), I think we don't necessarily call vacuum_delay_point() here. --- + appendStringInfo(&buf, "Vacuum cost-based delay parameters of parallel worker:\n"); + appendStringInfo(&buf, "vacuum_cost_limit =3D %d\n",vacuum_cost_lim= it); + appendStringInfo(&buf, "vacuum_cost_delay =3D %g\n", vacuum_cost_de= lay); + appendStringInfo(&buf, "vacuum_cost_page_miss =3D %d\n", VacuumCostPageMiss); + appendStringInfo(&buf, "vacuum_cost_page_dirty =3D %d\n", VacuumCostPageDirty); + appendStringInfo(&buf, "vacuum_cost_page_hit =3D %d\n", VacuumCostPageHit); I'd write these messages directly in elog() instead of using StringInfoData, which is simpler and can save palloc()/pfree(). --- + ereport(DEBUG2, errmsg("%s", buf.data)); Let's use elog() instead of ereport(). --- +# Create role with pg_signal_autovacuum_worker for terminating autovacuum worker. +$node->safe_psql('postgres', qq{ + CREATE ROLE regress_worker_role; + GRANT pg_signal_autovacuum_worker TO regress_worker_role; + SET ROLE regress_worker_role; +}); + +$node->safe_psql('postgres', qq{ + SELECT pg_terminate_backend('$av_pid'); +}); These two safe_psql calls use separate connections, meaning that pg_terminate_backend() is executed by the superuser rather than regress_worker_role. I think we don't need to create the regrss_worker_role and we can use the superuser in this test case. --- We would add more autovacuum related tests to the test_autovacuum directory in the future. Given that the 001_basic.pl focuses on parallel vacuum tests, how about renaming it to 001_parallel_vacuum.pl or something? > This time I'll try something experimental - besides the patches I'll also > post differences between corresponding patches from v20 and v21. > I.e. you can apply v20--v21-diff-for-0001 on the v20-0001 patch and > get the v21-0001 patch. There are a lot of changes, so I guess it will > help you during review. Please, let me know whether it is useful for you= . It was helpful to easily see the changes from the previous version. Thank y= ou! Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com