public inbox for [email protected]
help / color / mirror / Atom feedFrom: Emond Papegaaij <[email protected]>
To: [email protected]
Subject: Re: Primary node detection race at clean startup
Date: Tue, 12 May 2026 12:49:31 +0200
Message-ID: <CAGXsc+bBbfW7qQ+2JJ4SY7xZDifrz329cJcxHaoG+kpfGqBJaQ@mail.gmail.com> (raw)
In-Reply-To: <CAGXsc+ZmBoLs3Mz=G-Bdm4JJG+fH1NpHfR3qVJVwW4eBKWwStQ@mail.gmail.com>
References: <CAGXsc+ZmBoLs3Mz=G-Bdm4JJG+fH1NpHfR3qVJVwW4eBKWwStQ@mail.gmail.com>
Hi,
Something was wrong with the attached patch. It is rejected by patch,
probably because of the large context. Attached is a new version that
also works with patch.
Best regards,
Emond Papegaaij
Op di 12 mei 2026 om 10:38 schreef Emond Papegaaij <[email protected]>:
>
> Hi,
>
> In our tests, we've found an issue that can cause all Pgpool nodes to
> report an incorrect 'Role: standby':
> Role : standby ← stale, never updated on this node
> Backend Role : primary ← actual SR-check result
>
> This can happen if all nodes in a watchdog cluster start with a clean
> state at the same time. If the first node is still trying to determine
> the primary database, it's primary_node_id is -2. This value is then
> synced to other nodes in the cluster, causing all nodes to report the
> stale state indefinitely. Attached is a patch against 4.7 that should
> fix this.
>
> Note that this analysis was done by Claude Code and it also created
> the patch. The failure on our CI was real though and I think the
> explanation makes sense.
>
> Best regards,
> Emond Papegaaij
Attachments:
[application/x-patch] pgpool-keep-local-primary-when-leader-initial.patch (2.5K, 2-pgpool-keep-local-primary-when-leader-initial.patch)
download | inline diff:
Keep local primary_node_id when leader watchdog reports the initial -2 sentinel.
When all pgpool nodes in a cluster are stopped and started simultaneously
(e.g. via an admin "restart all pgpool nodes" action), every node initialises
Req_info->primary_node_id to -2 (the sentinel) and then runs
find_primary_node_repeatedly() locally to discover the real primary. The
watchdog elects a LEADER in parallel; the losing nodes transition to STANDBY,
receive SIG_WATCHDOG_STATE_CHANGED, and call sync_backend_from_watchdog() to
pull the leader's view.
If the leader's own find_primary_node_repeatedly() has not finished yet, the
leader serialises its still-uninitialised primary_node_id (-2) and the
standby's existing protective branch only covers -1 (quarantine). The -2
falls through to the else-clause and overwrites the standby's freshly-
determined valid primary_node_id with -2.
sync_backend_from_watchdog() is only re-invoked on SIG_BACKEND_SYNC_REQUIRED,
which is raised only on WD_FAILOVER_END. No subsequent event fires after a
simultaneous restart, so the standby is stuck at -2 indefinitely.
Add a guard that keeps the local primary_node_id when the leader's value is
the -2 initial sentinel.
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -3729,6 +3729,27 @@ sync_backend_from_watchdog(void)
Req_info->primary_node_id, backendStatus->nodeName),
errdetail("keeping the current primary")));
}
+ else if (Req_info->primary_node_id >= 0 &&
+ backendStatus->primary_node_id == -2)
+ {
+ /*
+ * Leader watchdog is still initialising and has not yet run
+ * find_primary_node_repeatedly(); its primary_node_id is still
+ * the initial sentinel -2. Do not overwrite our locally-determined
+ * primary with the leader's stale initial state.
+ *
+ * Without this guard, a simultaneous restart of all pgpool nodes
+ * leaves every STANDBY watchdog with primary_node_id = -2 forever:
+ * sync_backend_from_watchdog() is only re-invoked on
+ * SIG_BACKEND_SYNC_REQUIRED (raised from WD_FAILOVER_END), so
+ * there is no normal path that resyncs once the leader's own
+ * find_primary_node_repeatedly() completes.
+ */
+ ereport(LOG,
+ (errmsg("primary node on leader watchdog node \"%s\" is still in the initial state",
+ backendStatus->nodeName),
+ errdetail("keeping the locally-detected primary node:%d", Req_info->primary_node_id)));
+ }
else
{
Req_info->primary_node_id = backendStatus->primary_node_id;
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Primary node detection race at clean startup
In-Reply-To: <CAGXsc+bBbfW7qQ+2JJ4SY7xZDifrz329cJcxHaoG+kpfGqBJaQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox