Race condition in pcp_node_info can cause it to hang

public inbox for [email protected]  
help / color / mirror / Atom feed

Race condition in pcp_node_info can cause it to hang
4+ messages / 2 participants
[nested] [flat]

* Race condition in pcp_node_info can cause it to hang
@ 2026-06-04 13:00 Emond Papegaaij <[email protected]>
  2026-06-04 23:09 ` Re: Race condition in pcp_node_info can cause it to hang Tatsuo Ishii <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Emond Papegaaij @ 2026-06-04 13:00 UTC (permalink / raw)
  To: [email protected]

Hi,

We've hit another very rare flake in our tests, which can cause
pcp_node_info to hang indefinitely. I've analyzed the problem with
Claude Code, and it came to the conclusion and (quite small) fix
below. Attached is a patch against 4.7.

The problem:
In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
reads bi->replication_state and bi->replication_sync_state directly
from shared memory twice: once via strlen() to compute the packet
length, and once via pcp_write() to write the payload.

The streaming-replication check worker rewrites those same
shared-memory strings without a lock (it clears them to "" then
repopulates them every check cycle and on state transitions,
src/streaming_replication/pool_worker_child.c). If the string's length
changes between the two reads, the declared wsize no longer matches
the bytes actually written, so the PCP byte stream desynchronises. The
client then blocks forever in pcp_read() waiting for bytes the server
never sends.

The fix:
Snapshot the two strings into local buffers once, right after bi =
pool_get_node_info(i),
and use the locals for both the length and the payload — so a single
packet is always
internally consistent. This matches how every other field in the
packet is already
handled.

Best regards,
Emond


Attachments:

  [text/x-patch] pcp_node_info_hang.patch (2.5K, 2-pcp_node_info_hang.patch)
  download | inline diff:
diff --git a/src/pcp_con/pcp_worker.c b/src/pcp_con/pcp_worker.c
index 72bf68d84..a3ed3494d 100644
--- a/src/pcp_con/pcp_worker.c
+++ b/src/pcp_con/pcp_worker.c
@@ -896,6 +896,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
 			char		standby_delay_str[20];
 			char		standby_delay_by_time_str[4];
 			char		status_changed_time_str[20];
+			char		repl_state[NAMEDATALEN];
+			char		repl_sync_state[NAMEDATALEN];
 			char		code[] = "NodeInfo";
 			BackendInfo *bi = NULL;
 			SERVER_ROLE role;
@@ -910,6 +912,17 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
 						(errmsg("informing node info failed"),
 						 errdetail("invalid node ID")));
 
+			/*
+			 * Snapshot the replication state strings, which the sr-check
+			 * worker rewrites lock-free in shared memory.  They are used
+			 * for both the packet length (wsize) and the payload below;
+			 * reading them live twice could make the length disagree with
+			 * the bytes written and desync the PCP stream, hanging the
+			 * client.
+			 */
+			strlcpy(repl_state, bi->replication_state, sizeof(repl_state));
+			strlcpy(repl_sync_state, bi->replication_sync_state, sizeof(repl_sync_state));
+
 			snprintf(port_str, sizeof(port_str), "%d", bi->backend_port);
 			snprintf(status, sizeof(status), "%d", bi->backend_status);
 			snprintf(quarantine, sizeof(quarantine), "%d", bi->quarantine);
@@ -949,8 +962,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
 						  strlen(nodes[i].pg_role) + 1 +
 						  strlen(standby_delay_by_time_str) + 1 +
 						  strlen(standby_delay_str) + 1 +
-						  strlen(bi->replication_state) + 1 +
-						  strlen(bi->replication_sync_state) + 1 +
+						  strlen(repl_state) + 1 +
+						  strlen(repl_sync_state) + 1 +
 						  strlen(status_changed_time_str) + 1 +
 						  sizeof(int));
 			pcp_write(frontend, &wsize, sizeof(int));
@@ -965,8 +978,8 @@ inform_node_info(PCP_CONNECTION *frontend, char *buf)
 			pcp_write(frontend, nodes[i].pg_role, strlen(nodes[i].pg_role) + 1);
 			pcp_write(frontend, standby_delay_by_time_str, strlen(standby_delay_by_time_str) + 1);
 			pcp_write(frontend, standby_delay_str, strlen(standby_delay_str) + 1);
-			pcp_write(frontend, bi->replication_state, strlen(bi->replication_state) + 1);
-			pcp_write(frontend, bi->replication_sync_state, strlen(bi->replication_sync_state) + 1);
+			pcp_write(frontend, repl_state, strlen(repl_state) + 1);
+			pcp_write(frontend, repl_sync_state, strlen(repl_sync_state) + 1);
 			pcp_write(frontend, status_changed_time_str, strlen(status_changed_time_str) + 1);
 			do_pcp_flush(frontend);
 		}


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Race condition in pcp_node_info can cause it to hang
  2026-06-04 13:00 Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
@ 2026-06-04 23:09 ` Tatsuo Ishii <[email protected]>
  2026-06-05 11:49   ` Re: Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Tatsuo Ishii @ 2026-06-04 23:09 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

Hi Emond,

> Hi,
> 
> We've hit another very rare flake in our tests, which can cause
> pcp_node_info to hang indefinitely. I've analyzed the problem with
> Claude Code, and it came to the conclusion and (quite small) fix
> below. Attached is a patch against 4.7.
> 
> The problem:
> In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
> reads bi->replication_state and bi->replication_sync_state directly
> from shared memory twice: once via strlen() to compute the packet
> length, and once via pcp_write() to write the payload.
> 
> The streaming-replication check worker rewrites those same
> shared-memory strings without a lock (it clears them to "" then
> repopulates them every check cycle and on state transitions,
> src/streaming_replication/pool_worker_child.c). If the string's length
> changes between the two reads, the declared wsize no longer matches
> the bytes actually written, so the PCP byte stream desynchronises. The
> client then blocks forever in pcp_read() waiting for bytes the server
> never sends.
> 
> The fix:
> Snapshot the two strings into local buffers once, right after bi =
> pool_get_node_info(i),
> and use the locals for both the length and the payload ― so a single
> packet is always
> internally consistent. This matches how every other field in the
> packet is already
> handled.

Thank you for the report and fix. Yes, I agree there's a race
condition between sr checker process and pcp_node_info. I think
introducing a lock to protect bi->replication_state and
bi->replication_sync_state is overkill. The suggested fix seems to be
a right direction.  Will push after current release freeze is over
(supposed to be finished by the end of today).

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Race condition in pcp_node_info can cause it to hang
  2026-06-04 13:00 Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
  2026-06-04 23:09 ` Re: Race condition in pcp_node_info can cause it to hang Tatsuo Ishii <[email protected]>
@ 2026-06-05 11:49   ` Emond Papegaaij <[email protected]>
  2026-06-07 03:14     ` Re: Race condition in pcp_node_info can cause it to hang Tatsuo Ishii <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Emond Papegaaij @ 2026-06-05 11:49 UTC (permalink / raw)
  To: Tatsuo Ishii <[email protected]>; +Cc: [email protected]

Hi,

Thanks for the quick followup!

Best regards,
Emond

Op vr 5 jun 2026 om 01:09 schreef Tatsuo Ishii <[email protected]>:
>
> Hi Emond,
>
> > Hi,
> >
> > We've hit another very rare flake in our tests, which can cause
> > pcp_node_info to hang indefinitely. I've analyzed the problem with
> > Claude Code, and it came to the conclusion and (quite small) fix
> > below. Attached is a patch against 4.7.
> >
> > The problem:
> > In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
> > reads bi->replication_state and bi->replication_sync_state directly
> > from shared memory twice: once via strlen() to compute the packet
> > length, and once via pcp_write() to write the payload.
> >
> > The streaming-replication check worker rewrites those same
> > shared-memory strings without a lock (it clears them to "" then
> > repopulates them every check cycle and on state transitions,
> > src/streaming_replication/pool_worker_child.c). If the string's length
> > changes between the two reads, the declared wsize no longer matches
> > the bytes actually written, so the PCP byte stream desynchronises. The
> > client then blocks forever in pcp_read() waiting for bytes the server
> > never sends.
> >
> > The fix:
> > Snapshot the two strings into local buffers once, right after bi =
> > pool_get_node_info(i),
> > and use the locals for both the length and the payload ― so a single
> > packet is always
> > internally consistent. This matches how every other field in the
> > packet is already
> > handled.
>
> Thank you for the report and fix. Yes, I agree there's a race
> condition between sr checker process and pcp_node_info. I think
> introducing a lock to protect bi->replication_state and
> bi->replication_sync_state is overkill. The suggested fix seems to be
> a right direction.  Will push after current release freeze is over
> (supposed to be finished by the end of today).
>
> Regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Race condition in pcp_node_info can cause it to hang
  2026-06-04 13:00 Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
  2026-06-04 23:09 ` Re: Race condition in pcp_node_info can cause it to hang Tatsuo Ishii <[email protected]>
  2026-06-05 11:49   ` Re: Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
@ 2026-06-07 03:14     ` Tatsuo Ishii <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Tatsuo Ishii @ 2026-06-07 03:14 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

Hi,

Fix pushed to all supported branches.
https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=7c918dc247613d16d590a9f30ecc747da6871796

Thank you!

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Hi,
> 
> Thanks for the quick followup!
> 
> Best regards,
> Emond
> 
> Op vr 5 jun 2026 om 01:09 schreef Tatsuo Ishii <[email protected]>:
>>
>> Hi Emond,
>>
>> > Hi,
>> >
>> > We've hit another very rare flake in our tests, which can cause
>> > pcp_node_info to hang indefinitely. I've analyzed the problem with
>> > Claude Code, and it came to the conclusion and (quite small) fix
>> > below. Attached is a patch against 4.7.
>> >
>> > The problem:
>> > In inform_node_info() (src/pcp_con/pcp_worker.c), the PCP reply packet
>> > reads bi->replication_state and bi->replication_sync_state directly
>> > from shared memory twice: once via strlen() to compute the packet
>> > length, and once via pcp_write() to write the payload.
>> >
>> > The streaming-replication check worker rewrites those same
>> > shared-memory strings without a lock (it clears them to "" then
>> > repopulates them every check cycle and on state transitions,
>> > src/streaming_replication/pool_worker_child.c). If the string's length
>> > changes between the two reads, the declared wsize no longer matches
>> > the bytes actually written, so the PCP byte stream desynchronises. The
>> > client then blocks forever in pcp_read() waiting for bytes the server
>> > never sends.
>> >
>> > The fix:
>> > Snapshot the two strings into local buffers once, right after bi =
>> > pool_get_node_info(i),
>> > and use the locals for both the length and the payload ― so a single
>> > packet is always
>> > internally consistent. This matches how every other field in the
>> > packet is already
>> > handled.
>>
>> Thank you for the report and fix. Yes, I agree there's a race
>> condition between sr checker process and pcp_node_info. I think
>> introducing a lock to protect bi->replication_state and
>> bi->replication_sync_state is overkill. The suggested fix seems to be
>> a right direction.  Will push after current release freeze is over
>> (supposed to be finished by the end of today).
>>
>> Regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
> 
> 


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2026-06-07 03:14 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-04 13:00 Race condition in pcp_node_info can cause it to hang Emond Papegaaij <[email protected]>
2026-06-04 23:09 ` Tatsuo Ishii <[email protected]>
2026-06-05 11:49   ` Emond Papegaaij <[email protected]>
2026-06-07 03:14     ` Tatsuo Ishii <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox