Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uyaTF-009zfm-HM for pgsql-hackers@arkaria.postgresql.org; Tue, 16 Sep 2025 18:32:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uyaSF-001rxy-5x for pgsql-hackers@arkaria.postgresql.org; Tue, 16 Sep 2025 18:31:03 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uyaSE-001rxp-Og for pgsql-hackers@lists.postgresql.org; Tue, 16 Sep 2025 18:31:03 +0000 Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uyaSC-000lkB-2e for pgsql-hackers@lists.postgresql.org; Tue, 16 Sep 2025 18:31:02 +0000 Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-576d8b85ed1so799075e87.1 for ; Tue, 16 Sep 2025 11:31:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758047459; x=1758652259; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zGq2v2Es7Qy4UQSccn6y9A+Fe4KMDeN/lvwhtUnxtjg=; b=OCltpIrf5JzmxKdhSaQNl4T0h6LzWcbQ7jxWdbBSXK+2PSzAv37XZ62yuEkzMtuw6W mWPzFyLPE7nlmYluRn2mlC8nJ3BvEnVd045dVPcKeXSHYKofgnNpl83TdHLftPrrMY/E EKupHpHXQWnYMfGkdxWkUsXJfjXXMIcm8TwHKW20CcDEHUePZA+S6cPG0YJ1TX/Iwd7v ra3dePuNOiNm03wKPeewALuDz2PbANVAEiCXKHrucIU9dPZRca/jRjK3CjErQnX6eEjy zcQVtGrRc3HrlgAAj1IhbJcKjf8knZs7yjYpUISWZQzlrJmRg09+ynQB60Za/5fJzx4b GZYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758047459; x=1758652259; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zGq2v2Es7Qy4UQSccn6y9A+Fe4KMDeN/lvwhtUnxtjg=; b=sEtR+FTYmPN33xJceQCUGS27DH+ECpqRBZLHy/mQSpd9wzQt5+cjb0YCadqbNrjNz8 5uhv8iUtG10TNdoK17eUg7Q9gZCI6wEIGpzp3PV6KX8m7v9PquXbyaEGD8gMhGn8iWhu p+aOUZvervTI+vDyfYafFspI0QvlNIVSACmNOiQnbQnfLOwEkJ8uZshgtMHVncIqpcj1 1isKWcBOqLbDLWDomW7rBrlTkLWZ5gBgt1my8e7xrdVZWxWroT08mKyqDDEcAoLEbP85 yj5LtzhlbG3GLbRlhTTRi0cnQ35aS5nEaSX5pE74+PZGm7RIHFCgHDg0ZQHIBRfaSvv1 0ljw== X-Forwarded-Encrypted: i=1; AJvYcCW/Po7RhDVUgPSRjuqzR3KdXW/JXDu2tjkl2dOxTHdXuhuI/BCP0AF1lVWBR92becu9E7FXq6vUQYcS7hv/@lists.postgresql.org X-Gm-Message-State: AOJu0YzVqL2VgvXOpaLdIyPQSy9FO08Yp9AZqQxYbtJAgv77DZTzhRGc 6LUkURzQa0dE6+JWnQCJ6GCzIFWDdqlw2St2qtzBLWNxlOlwn/kutfSthSpP+szhGpENvcpTKQt EDEmFJfq8yhCQActPkm30snhk1qpvGyk= X-Gm-Gg: ASbGncvq1yPjkv2pQPfj1szJ/2ZputcRabUfrHpe+Y5DvOPLDqPhssLs9HXtl8wXVde DGhZk5nhWpLvc7FTyxWNDAvwgsdsm8qX2wjMhQDS4fZfDfEvcuFMYX4wthvZ4Jb/INqwB68DYUG LvSpImPnrcat6uTwFmYTOJDYE8iQWLB3Dx5c+fJHa4jOeEKfOwNjqK/1aTH0zR5rn69/wZj1yte JxoPw== X-Google-Smtp-Source: AGHT+IFNEn7G2tH+SV7dk/zdgAeVm5BHErFMpdj/a8RtmdQeu3Hn+/AVmPCiDS1zbGJODwB7LDneBjb0Wx2zx3IChnw= X-Received: by 2002:a19:6a0f:0:b0:568:25d0:f843 with SMTP id 2adb3069b0e04-5704aaa03acmr4895750e87.4.1758047459063; Tue, 16 Sep 2025 11:30:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Masahiko Sawada Date: Tue, 16 Sep 2025 11:30:22 -0700 X-Gm-Features: AS18NWBCoRwKg9G5eoTolrLRcWvKm5BgjXfzkcS1wTaiHpz2CXLH2P5Qb-0lFsM Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Alexander Korotkov Cc: Daniil Davydov <3danissimo@gmail.com>, Sami Imseih , Matheus Alcantara , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Sep 15, 2025 at 11:50=E2=80=AFAM Alexander Korotkov wrote: > > Hi! > > On Tue, Aug 19, 2025 at 12:04=E2=80=AFAM Masahiko Sawada wrote: > > > > On Mon, Aug 18, 2025 at 1:31=E2=80=AFAM Daniil Davydov <3danissimo@gmai= l.com> wrote: > > > > > > > > > On Fri, Aug 15, 2025 at 3:41=E2=80=AFAM Masahiko Sawada wrote: > > > > > > > > > > > 2. when an autovacuum worker (not parallel vacuum worker) who uses > > > > parallel vacuum gets SIGHUP, it errors out with the error message > > > > "parameter "max_stack_depth" cannot be set during a parallel > > > > operation". Autovacuum checks the configuration file reload in > > > > vacuum_delay_point(), and while reloading the configuration file, i= t > > > > attempts to set max_stack_depth in > > > > InitializeGUCOptionsFromEnvironment() (which is called by > > > > ProcessConfigFileInternal()). However, it cannot change > > > > max_stack_depth since the worker is in parallel mode but > > > > max_stack_depth doesn't have GUC_ALLOW_IN_PARALLEL flag. This doesn= 't > > > > happen in regular backends who are using parallel queries because t= hey > > > > check the configuration file reload at the end of each SQL command. > > > > > > > > > > Hm, this is a really serious problem. I see only two ways to solve it= (both are > > > not really good) : > > > 1) > > > Do not allow processing of the config file during parallel autovacuum > > > execution. > > > > > > 2) > > > Teach the autovacuum to enter parallel mode only during the index vac= uum/cleanup > > > phase. I'm a bit wary about it, because the design says that we shoul= d > > > be in parallel > > > mode during the whole parallel operation. But actually, if we can mak= e > > > sure that all > > > launched workers are exited, I don't see reasons, why can't we just > > > exit parallel mode > > > at the end of parallel_vacuum_process_all_indexes. > > > > > > What do you think about it? > > > > Hmm, given that we're trying to support parallel heap vacuum on > > another thread[1] and we will probably support it in autovacuums, it > > seems to me that these approaches won't work. > > > > Another idea would be to allow autovacuum workers to process the > > config file even in parallel mode. GUC changes in the leader worker > > would not affect parallel vacuum workers, but it is fine to me. In the > > context of autovacuum, only specific GUC parameters related to > > cost-based delays need to be affected also to parallel vacuum workers. > > Probably we need some changes to compute_parallel_delay() so that > > parallel workers can compute the sleep time based on the new > > vacuum_cost_limit and vacuum_cost_delay after the leader process > > (i.e., autovacuum worker) reloads the config file. > > > > > > > > Again, thank you for the review. Please, see v10 patches (only 0001 > > > has been changed) : > > > 1) Reserve and release workers only inside parallel_vacuum_process_al= l_indexes. > > > 2) Add try/catch block to the parallel_vacuum_process_all_indexes, so= we can > > > release workers even after an error. This required adding a static > > > variable to account > > > for the total number of reserved workers (av_nworkers_reserved). > > > 3) Cap autovacuum_max_parallel_workers by max_worker_processes only i= nside > > > autovacuum code. Assign hook has been removed. > > > 4) Use shmem value for determining the maximum number of parallel aut= ovacuum > > > workers (eliminate race condition between launcher and leader process= ). > > > > Thank you for updating the patch! I'll review the new version patches. > > I've rebased this patchset to the current master. That required me to mo= ve the new GUC definition to guc_parameters.dat. Also, I adjusted typedefs= .list and made pgindent. Some notes about the patch. Thank you for updating the patch! > I see parallel_vacuum_process_all_indexes() have a TRY/CATCH block. As I= heard, the overhead of setting/doing jumps is platform-dependent, and not = harmless on some platforms. Therefore, can we skip TRY/CATCH block for non= -autovacuum vacuum? Possibly we could move it to AutoVacWorkerMain(), that= would save us from repeatedly setting a jump in autovacuum workers too. I wonder if using the TRY/CATCH block is not enough to ensure that autovacuum workers release the reserved parallel workers in FATAL cases. > In general, I think this patchset badly lack of testing. I think it need= s tap tests checking from the logs that autovacuum has been done in paralle= l. Also, it would be good to set up some injection points, and check that = reserved autovacuum parallel workers are getting released correctly in the = case of errors. +1 IIUC the patch still has one problem in terms of reloading the configuration parameters during parallel mode as I mentioned before[1]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBRRXbNJEvCjS-0XZgCEeRBzQPK= mrSDjJ3wZ8TN28vaCQ%40mail.gmail.com --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com