Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uRevm-00EeNP-7z for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Jun 2025 22:37:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uRevj-00DYSp-8W for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Jun 2025 22:37:23 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uRevi-00DYSg-Rx for pgsql-hackers@lists.postgresql.org; Tue, 17 Jun 2025 22:37:23 +0000 Received: from mail-lj1-x22b.google.com ([2a00:1450:4864:20::22b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uRevg-002egM-37 for pgsql-hackers@lists.postgresql.org; Tue, 17 Jun 2025 22:37:23 +0000 Received: by mail-lj1-x22b.google.com with SMTP id 38308e7fff4ca-32b5226e6beso35231691fa.2 for ; Tue, 17 Jun 2025 15:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750199839; x=1750804639; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2i/0D7WaMrKHibWTnEMi46pvrv+fEvN9hyDpI9dO8yA=; b=OFXz8aKwyQYRvplGGe7oljeFTsKvErSed2bDqirnLLAIxNTGwc+1Ih46QzPf5UsEPK B5XyOSYXWjl2lgTRcOOPD6Vd+8PiE/33XIHywci54RZ/9bxwMupmVkOB56C8Y3yDvNzB oxmYexsS+9eqOFLt8OSmick0L5aRaoxO9z5oPKrzNhTd/AiEJJKq+O1GRIdakbynJjig epgMzwBQSbo5QEKA8evBz6BWRkuabmOdUii6XjoaEBMKOm1joI1VC0+mJtf8DiB3SYqt p28A56c5wJ0OjO2SMMR8BCXEagBuTqQHumM0LXFoVj2wLfXPxh7qWDYv9pkSwWDQXQ7N KFsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750199839; x=1750804639; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2i/0D7WaMrKHibWTnEMi46pvrv+fEvN9hyDpI9dO8yA=; b=pm+h3ll97Q1To2s994unykWEjQmQ2IiWWY8yscsv0n5rb45MdxeLXzMYmbSKvUPft2 cXTW0exeZFCwuCJETh+/WTuJwP0NBy4yQDo24z5kIkhR39v/iXzCJsLTTCQ4lOHGGSG/ R2jxyVtb0+E0DFnGopdAqGgcnlxML8ICFUObqPIWFYsCBbnhGS/UDTSh8raNjS5Id8Wg Rgd5H7K/XF+sIcfK+jjswhhvLzsmJqW8uB55lX39EpQsbMqn4TAJ4T8BOSsT9BWi0C1D WmwoewVIoxqxekcfjrqy5OYmPnNPR79f4oBIFtYRWaO6raNwPBLx0GHKoyo17D67oavu EcFQ== X-Forwarded-Encrypted: i=1; AJvYcCWEY77mQOipu562wxQ9+bpC3iD/o8MeqZUAhviEbqAeZJIrIvj00mtG8DR+YO4fc2k65PmfU1zdrVnJ3t0w@lists.postgresql.org X-Gm-Message-State: AOJu0YyIwhlrO4aU1FRLEU51LEp7961h4ml+Dxj2VzYhfOJfyNKVbM+W pxdcHrpu7VK46jmO8OsqKNrsfoj01UHFrDuHWUZkElzv6MXe8+2T8Mqfna7mE5u/C4hV/RY67r+ YHFNWd0XL21DkzCSje+JtI4E97LbYknQ= X-Gm-Gg: ASbGnctpoUs7mIH3mhp4pIftWuZzMc5RK3Dg+SERhyiXja5FBhvaOc4SgV0SI7wmZdz qT01YuzTg0jHPGhp/hZxl5Nr0MBZmDRjYJbgdvqmkIyJNNndxtLwUzB412tFXzsM71nGxt0AuP7 hE8w0L1pI66Z9RYSXD6fNDBKNkaaAVUAjlq4P3C4TaHRNJ X-Google-Smtp-Source: AGHT+IEe59zkJRr6YCorFmey8aS7IvU+0/PC4LKYasA6DzDBTPKB/oFd6jSQohuZFBdY5dxS5hv4J/S+TMZ/HoNlDJ4= X-Received: by 2002:a2e:a7c1:0:b0:32b:55bd:d948 with SMTP id 38308e7fff4ca-32b55bddb6bmr34794871fa.11.1750199838160; Tue, 17 Jun 2025 15:37:18 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Tue, 17 Jun 2025 15:36:41 -0700 X-Gm-Features: AX0GCFv4L5aAp0W-UpEzCZ4jWj4FBiUSOOadbodtMO0QyoErQgzxxp7-jwubYf4 Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Daniil Davydov <3danissimo@gmail.com> Cc: Matheus Alcantara , Sami Imseih , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, May 25, 2025 at 10:22=E2=80=AFAM Daniil Davydov <3danissimo@gmail.c= om> wrote: > > Hi, > > On Fri, May 23, 2025 at 6:12=E2=80=AFAM Masahiko Sawada wrote: > > > > On Thu, May 22, 2025 at 12:44=E2=80=AFAM Daniil Davydov <3danissimo@gma= il.com> wrote: > > > > > > On Wed, May 21, 2025 at 5:30=E2=80=AFAM Masahiko Sawada wrote: > > > > > > > > I find that the name "autovacuum_reserved_workers_num" is generic. = It > > > > would be better to have a more specific name for parallel vacuum su= ch > > > > as autovacuum_max_parallel_workers. This parameter is related to > > > > neither autovacuum_worker_slots nor autovacuum_max_workers, which > > > > seems fine to me. Also, max_parallel_maintenance_workers doesn't > > > > affect this parameter. > > > > > > This was my headache when I created names for variables. Autovacuum > > > initially implies parallelism, because we have several parallel a/v > > > workers. > > > > I'm not sure if it's parallelism. We can have multiple autovacuum > > workers simultaneously working on different tables, which seems not > > parallelism to me. > > Hm, I didn't thought about the 'parallelism' definition in this way. > But I see your point - the next v4 patch will contain the naming that > you suggest. > > > > > > So I think that parameter like > > > `autovacuum_max_parallel_workers` will confuse somebody. > > > If we want to have a more specific name, I would prefer > > > `max_parallel_index_autovacuum_workers`. > > > > It's better not to use 'index' as we're trying to extend parallel > > vacuum to heap scanning/vacuuming as well[1]. > > OK, I'll fix it. > > > > > + /* > > > > + * If we are running autovacuum - decide whether we need to pro= cess indexes > > > > + * of table with given oid in parallel. > > > > + */ > > > > + if (AmAutoVacuumWorkerProcess() && > > > > + params->index_cleanup !=3D VACOPTVALUE_DISABLED && > > > > + RelationAllowsParallelIdxAutovac(rel)) > > > > > > > > I think that this should be done in autovacuum code. > > > > > > We need params->index cleanup variable to decide whether we need to > > > use parallel index a/v. In autovacuum.c we have this code : > > > *** > > > /* > > > * index_cleanup and truncate are unspecified at first in autovacuum. > > > * They will be filled in with usable values using their reloptions > > > * (or reloption defaults) later. > > > */ > > > tab->at_params.index_cleanup =3D VACOPTVALUE_UNSPECIFIED; > > > tab->at_params.truncate =3D VACOPTVALUE_UNSPECIFIED; > > > *** > > > This variable is filled in inside the `vacuum_rel` function, so I > > > think we should keep the above logic in vacuum.c. > > > > I guess that we can specify the parallel degree even if index_cleanup > > is still UNSPECIFIED. vacuum_rel() would then decide whether to use > > index vacuuming and vacuumlazy.c would decide whether to use parallel > > vacuum based on the specified parallel degree and index_cleanup value. > > > > > > > > > +#define AV_PARALLEL_DEADTUP_THRESHOLD 1024 > > > > > > > > These fixed values really useful in common cases? I think we alread= y > > > > have an optimization where we skip vacuum indexes if the table has > > > > fewer dead tuples (see BYPASS_THRESHOLD_PAGES). > > > > > > When we allocate dead items (and optionally init parallel autocuum) w= e > > > don't have sane value for `vacrel->lpdead_item_pages` (which should b= e > > > compared with BYPASS_THRESHOLD_PAGES). > > > The only criterion that we can focus on is the number of dead tuples > > > indicated in the PgStat_StatTabEntry. > > > > My point is that this criterion might not be useful. We have the > > bypass optimization for index vacuuming and having many dead tuples > > doesn't necessarily mean index vacuuming taking a long time. For > > example, even if the table has a few dead tuples, index vacuuming > > could take a very long time and parallel index vacuuming would help > > the situation, if the table is very large and has many indexes. > > That sounds reasonable. I'll fix it. > > > > But autovacuum (as I think) should work as stable as possible and > > > `unnoticed` by other processes. Thus, we must : > > > 1) Compute resources (such as the number of parallel workers for a > > > single table's indexes vacuuming) as efficiently as possible. > > > 2) Provide a guarantee that as many tables as possible (among > > > requested) will be processed in parallel. > > > > > > (1) can be achieved by calculating the parameters on the fly. > > > NUM_INDEXES_PER_PARALLEL_WORKER is a rough mock. I can provide more > > > accurate value in the near future. > > > > I think it requires more things than the number of indexes on the > > table to achieve (1). Suppose that there is a very large table that > > gets updates heavily and has a few indexes. If users want to avoid the > > table from being bloated, it would be a reasonable idea to use > > parallel vacuum during autovacuum and it would not be a good idea to > > disallow using parallel vacuum solely because it doesn't have more > > than 30 indexes. On the other hand, if the table had got many updates > > but not so now, users might want to use resources for autovacuums on > > other tables. We might need to consider autovacuum frequencies per > > table, the statistics of the previous autovacuum, or system loads etc. > > So I think that in order to achieve (1) we might need more statistics > > and using only NUM_INDEXES_PER_PARALLEL_WORKER would not work fine. > > > > It's hard for me to imagine exactly how extended statistics will help > us track such situations. > It seems that for any of our heuristics, it will be possible to come > up with a counter example. > Maybe we can give advices (via logs) to the user? But for such an > idea, tests should be conducted so that we can understand when > resource consumption becomes ineffective. > I guess that we need to agree on an implementation before conducting such= tests. > > > > (2) can be achieved by workers reserving - we know that N workers > > > (from bgworkers pool) are *always* at our disposal. And when we use > > > such workers we are not dependent on other operations in the cluster > > > and we don't interfere with other operations by taking resources away > > > from them. > > > > Reserving some bgworkers for autovacuum could make sense. But I think > > it's better to implement it in a general way as it could be useful in > > other use cases too. That is, it might be a good to implement > > infrastructure so that any PostgreSQL code (possibly including > > extensions) can request allocating a pool of bgworkers for specific > > usage and use bgworkers from them. > > Reserving infrastructure is an ambitious idea. I am not sure that we > should implement it within this thread and feature. > Maybe we should create a separate thread for it and as a > justification, refer to parallel autovacuum? > > ----- > Thanks everybody for feedback! I attach a v4 patch to this letter. > Main features : > 1) 'parallel_autovacuum_workers' reloption - integer value, that sets > the maximum number of parallel a/v workers that can be taken from > bgworkers pool in order to process this table. > 2) 'max_parallel_autovacuum_workers' - GUC variable, that sets the > maximum total number of parallel a/v workers, that can be taken from > bgworkers pool. > 3) Parallel autovacuum does not try to use thresholds like > NUM_INDEXES_PER_PARALLEL_WORKER and AV_PARALLEL_DEADTUP_THRESHOLD. > 4) Parallel autovacuum now can report statistics like "planned vs. launch= ed". > 5) For now I got rid of the 'reserving' idea, so now autovacuum > leaders are competing with everyone for parallel workers from the > bgworkers pool. > > What do you think about this implementation? > I think it basically makes sense to me. A few comments: --- The patch implements max_parallel_autovacuum_workers as a PGC_POSTMASTER parameter but can we make it PGC_SIGHUP? I think we don't necessarily need to make it a PGC_POSTMATER since it actually doesn't affect how much shared memory we need to allocate. --- I think it's better to have the prefix "autovacuum" for the new GUC parameter for better consistency with other autovacuum-related GUC parameters. --- #include "storage/spin.h" @@ -514,6 +515,11 @@ ReinitializeParallelDSM(ParallelContext *pcxt) { WaitForParallelWorkersToFinish(pcxt); WaitForParallelWorkersToExit(pcxt); + + /* Release all launched (i.e. reserved) parallel autovacuum workers= . */ + if (AmAutoVacuumWorkerProcess()) + ParallelAutoVacuumReleaseWorkers(pcxt->nworkers_launched); + pcxt->nworkers_launched =3D 0; if (pcxt->known_attached_workers) { @@ -1002,6 +1008,11 @@ DestroyParallelContext(ParallelContext *pcxt) */ HOLD_INTERRUPTS(); WaitForParallelWorkersToExit(pcxt); + + /* Release all launched (i.e. reserved) parallel autovacuum workers. */ + if (AmAutoVacuumWorkerProcess()) + ParallelAutoVacuumReleaseWorkers(pcxt->nworkers_launched); + RESUME_INTERRUPTS(); I think that it's better to release workers in vacuumparallel.c rather than parallel.c. Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com