public inbox for [email protected]
help / color / mirror / Atom feedFrom: Emond Papegaaij <[email protected]>
To: [email protected]
Subject: Race condition in pcp_node_info can cause it to hang
Date: Thu, 4 Jun 2026 15:00:38 +0200
Message-ID: <CAGXsc+ZhGjwm+F42Xmt8Qn1qP_h7woipiV0WsY-e-P7W3ZG2OA@mail.gmail.com> (raw)
Hi,
We've hit another very rare flake in our tests, which can cause
pcp_node_info to hang indefinitely. I've analyzed the problem with
Claude Code, and it came to the conclusion and (quite small) fix
below. Attached is a patch against 4.7.
The problem:
In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
reads bi->replication_state and bi->replication_sync_state directly
from shared memory twice: once via strlen() to compute the packet
length, and once via pcp_write() to write the payload.
The streaming-replication check worker rewrites those same
shared-memory strings without a lock (it clears them to "" then
repopulates them every check cycle and on state transitions,
src/streaming_replication/pool_worker_child.c). If the string's length
changes between the two reads, the declared wsize no longer matches
the bytes actually written, so the PCP byte stream desynchronises. The
client then blocks forever in pcp_read() waiting for bytes the server
never sends.
The fix:
Snapshot the two strings into local buffers once, right after bi =
pool_get_node_info(i),
and use the locals for both the length and the payload — so a single
packet is always
internally consistent. This matches how every other field in the
packet is already
handled.
Best regards,
Emond
Attachments:
[text/x-patch] pcp_node_info_hang.patch (2.5K, 2-pcp_node_info_hang.patch)
download | inline diff:
diff --git a/src/pcp_con/pcp_worker.c b/src/pcp_con/pcp_worker.c
index 72bf68d84..a3ed3494d 100644
--- a/src/pcp_con/pcp_worker.c
+++ b/src/pcp_con/pcp_worker.c
@@ -896,6 +896,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
char standby_delay_str[20];
char standby_delay_by_time_str[4];
char status_changed_time_str[20];
+ char repl_state[NAMEDATALEN];
+ char repl_sync_state[NAMEDATALEN];
char code[] = "NodeInfo";
BackendInfo *bi = NULL;
SERVER_ROLE role;
@@ -910,6 +912,17 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
(errmsg("informing node info failed"),
errdetail("invalid node ID")));
+ /*
+ * Snapshot the replication state strings, which the sr-check
+ * worker rewrites lock-free in shared memory. They are used
+ * for both the packet length (wsize) and the payload below;
+ * reading them live twice could make the length disagree with
+ * the bytes written and desync the PCP stream, hanging the
+ * client.
+ */
+ strlcpy(repl_state, bi->replication_state, sizeof(repl_state));
+ strlcpy(repl_sync_state, bi->replication_sync_state, sizeof(repl_sync_state));
+
snprintf(port_str, sizeof(port_str), "%d", bi->backend_port);
snprintf(status, sizeof(status), "%d", bi->backend_status);
snprintf(quarantine, sizeof(quarantine), "%d", bi->quarantine);
@@ -949,8 +962,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
strlen(nodes[i].pg_role) + 1 +
strlen(standby_delay_by_time_str) + 1 +
strlen(standby_delay_str) + 1 +
- strlen(bi->replication_state) + 1 +
- strlen(bi->replication_sync_state) + 1 +
+ strlen(repl_state) + 1 +
+ strlen(repl_sync_state) + 1 +
strlen(status_changed_time_str) + 1 +
sizeof(int));
pcp_write(frontend, &wsize, sizeof(int));
@@ -965,8 +978,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
pcp_write(frontend, nodes[i].pg_role, strlen(nodes[i].pg_role) + 1);
pcp_write(frontend, standby_delay_by_time_str, strlen(standby_delay_by_time_str) + 1);
pcp_write(frontend, standby_delay_str, strlen(standby_delay_str) + 1);
- pcp_write(frontend, bi->replication_state, strlen(bi->replication_state) + 1);
- pcp_write(frontend, bi->replication_sync_state, strlen(bi->replication_sync_state) + 1);
+ pcp_write(frontend, repl_state, strlen(repl_state) + 1);
+ pcp_write(frontend, repl_sync_state, strlen(repl_sync_state) + 1);
pcp_write(frontend, status_changed_time_str, strlen(status_changed_time_str) + 1);
do_pcp_flush(frontend);
}
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Race condition in pcp_node_info can cause it to hang
In-Reply-To: <CAGXsc+ZhGjwm+F42Xmt8Qn1qP_h7woipiV0WsY-e-P7W3ZG2OA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox