Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPJkk-000e5n-37 for pgpool-hackers@arkaria.postgresql.org; Tue, 19 May 2026 12:40:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPJkh-004IbQ-2M for pgpool-hackers@arkaria.postgresql.org; Tue, 19 May 2026 12:40:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPJkh-004IbJ-1R for pgpool-hackers@lists.postgresql.org; Tue, 19 May 2026 12:40:52 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wPJkf-00000000N6x-0DQm for pgpool-hackers@lists.postgresql.org; Tue, 19 May 2026 12:40:51 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=BuYatow3QLps52QXM2bjSj6w8dHtmxu5L+Ati3hpBmQ=; b=jGD4OXWSkI/F5VIPUfPdgeOWia /i3yO7dn429UM0V7k8TaDThnO2e0vOED8+QcCAg5ABCwosBm4LMlUxIDYnqrRqUAWmlLYtmRIIrBT Ed/CNC3t5K2q5gBgwJGEamXz/R0Pj8zcKR978+fjh+5GDztU62n/o20qk9rB2BqeTwNxdnwhspPIF MlVP1W03zt5zb6wRrmfmH4FcYq+2pBPgmNv0k8Sq1xRcUKtY9mD6LUPfByo7S5m22xHos5fZcaLQb nVXlxFFVdZYJGHdbsPj4D3mvJ4TIzzjO88ILIFGwZQ4DI17vnhL+GqJzJUQTGeRM7i8FhS9+pks6w zt7k+3mg==; Received: from [2409:11:4120:300:a0fe:eb4c:f182:d4d2] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPJkc-000nMG-1x; Tue, 19 May 2026 12:40:48 +0000 Date: Tue, 19 May 2026 21:40:37 +0900 (JST) Message-Id: <20260519.214037.579991005061650329.ishii@postgresql.org> To: emond.papegaaij@gmail.com Cc: pgpool-hackers@lists.postgresql.org Subject: Re: Primary node detection race at clean startup From: Tatsuo Ishii In-Reply-To: References: X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:a0fe:eb4c:f182:d4d2 (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Emond, > Hi, > > In our tests, we've found an issue that can cause all Pgpool nodes to > report an incorrect 'Role: standby': > Role : standby ← stale, never updated on this node > Backend Role : primary ← actual SR-check result > > This can happen if all nodes in a watchdog cluster start with a clean > state at the same time. If the first node is still trying to determine > the primary database, it's primary_node_id is -2. This value is then > synced to other nodes in the cluster, causing all nodes to report the > stale state indefinitely. Attached is a patch against 4.7 that should > fix this. > > Note that this analysis was done by Claude Code and it also created > the patch. The failure on our CI was real though and I think the > explanation makes sense. I have looked into the patch. Although I failed to reproduce the issue, I agree with you: the explanation makes sense. Also I have run the regression test and all test passed. I am going to push the patch to all supported branches. Regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp