Problem with pcp process

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tatsuo Ishii <[email protected]>
To: [email protected]
Subject: Problem with pcp process
Date: Sat, 25 Apr 2026 22:30:51 +0900 (JST)
Message-ID: <[email protected]> (raw)

Koshino told me off-list that following script does not work:
-------------------------------------
pgpool_setup -n 3 --no-stop
pg_ctl -D data2 stop
while true
do
    psql -p 11000 -c "show pool_nodes" test
    if [ $? = 0 ];then
	break;
    fi
    sleep 1
done
psql -p 11000 -c "show pool_nodes" test
pcp_recovery_node -p 11001 -n 2;pcp_promote_node -p 11001 -n 2 -s -g
-------------------------------------

pcp_recovery_node reports success but pcp_promote_node just hangs. I
found pcp worker process loops infinitely around line 584 in
pool_detach_node (pcp_worker.c):

	while (!pcp_worker_wakeup_request)
	{
		struct timeval t = {1, 0};

		select(0, NULL, NULL, NULL, &t);
	}

pcp_worker_wakeup_request is a variable supposed to be set to 1 by
SIGUSR2 signal handler. When pgpool main finishes failover requests
from pcp, it sends SIGUSR2 to pcp main process, then it forwards to
pcp worker process, and its signal handler sets the variable to 1. To
find the process id to forward the signal, pcp main process keeps a
list of pids of forked children (pcp worker process) in its local
memory.

Upon failover, pgpool main sends a signal to pcp main process to
request restarting, and pgpool main restarts. Problem is, when pcp
main restarts, it forgets the list of pids. As a result, when pgpool
main sends SIGUSR2 to pcp main, it cannot find the pid to send the
signal to, which causes the infinite loop in pcp worker process.

To fix the problem, we could delay the restarting of pcp main until it
delivers the signal. Unfortunately this does not work, since pgpool
main waits for pcp main process to exit. Thus processing failover does
not proceed in pgpool main.

So I decided to add a new shared memory area to hold the pcp workers
pids as an array. Upon restarting of pcp main process, it reads the
pids from the shared memory into its local memory. When child process
is forked, its pid is added to the shared memory array. When child
process exits, its pid in the array is cleared to 0, representing an
empty slot.

Attached is a patch to implement it.

I also find similar issue with pgpool_setup. For example, pgpool_setup
-n 3 creates 3 PostgreSQL nodes. To create the standbys, pgpool_setup
uses pcp_recovery_node command. The first node creation is fine. But
in the second creation, pcp_recovery_node actually is timed out (5
seconds). pcp_recovery_node also has a similar loop above. However the
loop is timed out, instead of infinite looping. As a result, the
second pcp_recovery_node looks as if suceeded, just takes longer time
(5 seconds). The patch also fixed the case: now the second
pcp_recovery_node finishes quickly.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp


Attachments:

  [text/x-patch] pcp_child_sigusr2_fix.patch (3.6K, 2-pcp_child_sigusr2_fix.patch)
  download | inline diff:
diff --git a/src/include/pool.h b/src/include/pool.h
index 65907dcf1..fea5744f3 100644
--- a/src/include/pool.h
+++ b/src/include/pool.h
@@ -487,6 +487,13 @@ typedef struct
 	int			count;			/* request node ids count */
 } POOL_REQUEST_NODE;
 
+/*
+ * Maximum number of pcp worker child process.  * Since pcp worker process is
+ * forked whenever failover/failback request is made, it should be equal to
+ * MAX_REQUEST_QUEUE_SIZE + some room. 10 is an arbitrary number.
+*/
+#define	MAX_PCP_WORKER_PIDS		MAX_REQUEST_QUEUE_SIZE + 10
+
 typedef struct
 {
 	POOL_REQUEST_NODE request[MAX_REQUEST_QUEUE_SIZE];
@@ -524,6 +531,12 @@ typedef struct
 	bool		query_cache_invalidate_request; /* true if
 												 * pcp_invalidate_query_cache
 												 * requested */
+
+	/*
+	 * pcp worker child pids. This is inherited to new pcp main process to
+	 * track pcp worker child when new pcp worker child starts.
+	 */
+	pid_t		pcp_worker_pids[MAX_PCP_WORKER_PIDS];
 } POOL_REQUEST_INFO;
 
 /* description of row. corresponding to RowDescription message */
diff --git a/src/main/pgpool_main.c b/src/main/pgpool_main.c
index 32bcb0a1f..4112074e2 100644
--- a/src/main/pgpool_main.c
+++ b/src/main/pgpool_main.c
@@ -3200,6 +3200,8 @@ initialize_shared_mem_objects(bool clear_memcache_oidmaps)
 		wd_ipc_initialize_data();
 	}
 
+	/* initialize pcp worker child pids */
+	memset(Req_info->pcp_worker_pids, 0, sizeof(Req_info->pcp_worker_pids));
 }
 
 /*
diff --git a/src/pcp_con/pcp_child.c b/src/pcp_con/pcp_child.c
index e07c8897e..fc1cda311 100644
--- a/src/pcp_con/pcp_child.c
+++ b/src/pcp_con/pcp_child.c
@@ -154,6 +154,16 @@ pcp_main(int *fds)
 	/* We can now handle ereport(ERROR) */
 	PG_exception_stack = &local_sigjmp_buf;
 
+	/*
+	 * Restore pcp woker child pids from shmem
+	 */
+	for (int i = 0; i < MAX_PCP_WORKER_PIDS; i++)
+	{
+		pid_t	pid = Req_info->pcp_worker_pids[i];
+		if (pid != 0)
+			pcp_worker_children = lappend_int(pcp_worker_children, (int) pid);
+	}
+
 	/*
 	 * Unblock signals
 	 */
@@ -326,6 +336,7 @@ start_pcp_command_processor_process(int port, int *fds)
 	}
 	else						/* parent */
 	{
+		int		i;
 		if (pool_config->log_pcp_processes)
 			ereport(LOG,
 					(errmsg("forked new pcp worker, pid=%d socket=%d",
@@ -334,6 +345,18 @@ start_pcp_command_processor_process(int port, int *fds)
 		close(port);
 		/* Add it to the list */
 		pcp_worker_children = lappend_int(pcp_worker_children, (int) pid);
+		/* save it to shmem */
+		for (i = 0; i < MAX_PCP_WORKER_PIDS; i++)
+		{
+			if (Req_info->pcp_worker_pids[i] == 0)
+			{
+				Req_info->pcp_worker_pids[i] = pid;
+				break;
+			}
+		}
+		if (i == MAX_PCP_WORKER_PIDS)
+			ereport(WARNING,
+					(errmsg("no empty slot in pcp worker table")));
 	}
 }
 
@@ -378,6 +401,16 @@ reaper(void)
 
 	while ((pid = pool_waitpid(&status)) > 0)
 	{
+		/* remove the pid from shmem */
+		for (int i = 0; i < MAX_PCP_WORKER_PIDS; i++)
+		{
+			if (Req_info->pcp_worker_pids[i] == pid)
+			{
+				Req_info->pcp_worker_pids[i] = 0;
+				break;
+			}
+		}
+
 		if (WIFEXITED(status))
 		{
 			if (WEXITSTATUS(status) == POOL_EXIT_FATAL)
@@ -405,6 +438,17 @@ reaper(void)
 				(errmsg("going to remove pid: %d from pid list having %d elements", pid, list_length(pcp_worker_children))));
 		/* remove the pid of process from the list */
 		pcp_worker_children = list_delete_int(pcp_worker_children, pid);
+
+		/* remove the pid from shmem */
+		for (int i = 0; i < MAX_PCP_WORKER_PIDS; i++)
+		{
+			if (Req_info->pcp_worker_pids[i] == pid)
+			{
+				Req_info->pcp_worker_pids[i] = 0;
+				break;
+			}
+		}
+
 		ereport(DEBUG2,
 				(errmsg("new list have %d elements", list_length(pcp_worker_children))));
 	}

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Problem with pcp process
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox