Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8237-00096E-2B for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 20:20:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8236-002E9U-0b for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 20:20:24 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8235-002E9M-2p for pgsql-hackers@lists.postgresql.org; Wed, 01 Apr 2026 20:20:24 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w8233-000000004aj-09HC for pgsql-hackers@postgresql.org; Wed, 01 Apr 2026 20:20:23 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 631KKA9R3074141; Wed, 1 Apr 2026 16:20:10 -0400 From: Tom Lane To: Alexander Lakhin cc: Michael Paquier , =?UTF-8?B?SXdhdGEsIEF5YS/lsqnnlLAg5b2p?= , Peter Smith , =?UTF-8?B?S3Vyb2RhLCBIYXlhdG8v6buS55SwIOmavOS6ug==?= , Pavel Stehule , Chao Li , pgsql-hackers Subject: Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE In-reply-to: References: <1020519.1773863522@sss.pgh.pa.us> Comments: In-reply-to Alexander Lakhin message dated "Tue, 31 Mar 2026 20:00:00 +0300" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3074139.1775074810.1@sss.pgh.pa.us> Date: Wed, 01 Apr 2026 16:20:10 -0400 Message-ID: <3074140.1775074810@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Alexander Lakhin writes: > I think this can explain slow CommitTransactionCommand() and why it > happens not every time. Regarding other animals, I guess they can > experience the same bumps but not exceeding 5 seconds (50 tries). Thus, > from my understanding, for the failure to happen, we need to have slow > storage and initialize_worker_spi() -> CommitTransactionCommand() reaching > XLogFileClose(). So, it remains not very clear why only widowbird is showing this failure, but I think we can safely take away the bottom-line conclusion that hard-wiring a maximum wait of 5s in CountOtherDBBackends() was not a great idea. I don't think I want to propose a GUC for this, but could it make sense to check for an environment variable, similarly to PGCTLTIMEOUT and PG_TEST_TIMEOUT_DEFAULT? Whichever way we do it, it could replace the existing crude hack to change the max wait in injection-point mode. regards, tom lane