public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tatsuo Ishii <[email protected]>
To: [email protected]
Cc: [email protected]
Subject: Re: Proposal: recent access based routing for primary-replica setups
Date: Fri, 26 Dec 2025 16:54:46 +0900 (JST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACeKOO3QPNrU81W5kNnUdTwcY-Ld8Eu2qowiadc7M3iJ5u+w3g@mail.gmail.com>
References: <CACeKOO3fXtC2BARqU2P6Oae0PAgqqHAHskh_Xkeos+Z=9ve+xQ@mail.gmail.com>
<[email protected]>
<CACeKOO3QPNrU81W5kNnUdTwcY-Ld8Eu2qowiadc7M3iJ5u+w3g@mail.gmail.com>
Hi Nadav,
(Please disregard previous mail. I seem to have mangled the message).
I think I found a cause of the problem. On Linux, if SIGCHLD is
ignored (set to SIG_IGN), waitpid() cannot get proper child status.
Because the kernel relcaims the resource for the child process to not
make the child process a zombie. And this makes waitpid() to fail with
ECHLD. Since the return of waitpid() is not checked, I did not notice
the waitpid() failure (I recommend to check the return value of
waitpid()).
/* set up signal handlers */
signal(SIGALRM, SIG_DFL);
signal(SIGTERM, my_signal_handler);
signal(SIGINT, my_signal_handler);
signal(SIGHUP, reload_config_handler);
signal(SIGQUIT, my_signal_handler);
signal(SIGCHLD, SIG_IGN); <--- SIGCHLD is ignored
signal(SIGUSR1, my_signal_handler);
signal(SIGUSR2, SIG_IGN);
signal(SIGPIPE, SIG_IGN);
To fix this, either change the line above to:
signal(SIGCHLD, SIG_DFL);
or
signal(SIGCHLD, my_signal_handler);
and modify my_signal_handler.
I recommend the latter, because it does not depend on the default
behavior of SIGCHLD, which might be different per platform.
Attached is the patch to do this. (and run pgindent).
I also notice that something like:
/* Count tokens in output for validation */
char *line_copy = pstrdup(line);
char *temp_token = strtok(line_copy, " \t\n");
You should declare line_copy and temp_token in the begging of the code
block (or in the outer block). The forward declaration is recommended
coding style in Pgpool-II (and PostgreSQL). Same thing can be said to
some other variables.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> Hi Tatsuo,
>
> Thank you for the note.
>
> I've removed the docker stuff. started working in an ubuntu 24 VM to match
> the setup. hopefully the results will be better, had so many issues
> compiling and testing before that stuff wasn't properly formulated.
>
> Attaching the latest patch.
>
> this is what i'm seeing:
> adav@lima-dev:/src/pgpool2/src/test/regression$ PGHOST=/tmp ./regress.sh -p
> /usr/bin 041.external_replication_delay
> creating pgpool-II temporary installation ...
> moving pgpool_setup to temporary installation path ...
> moving watchdog_setup to temporary installation path ...
> using pgpool-II at /src/pgpool2/src/test/regression/temp/installed
> *************************
> REGRESSION MODE : install
> Pgpool-II version : pgpool-II version 4.8devel (mitsukakeboshi)
> Pgpool-II install path : /src/pgpool2/src/test/regression/temp/installed
> PostgreSQL bin : /usr/lib/postgresql/16/bin
> PostgreSQL Major version : 16
> pgbench : /usr/lib/postgresql/16/bin/pgbench
> PostgreSQL jdbc :
> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
> *************************
> testing 041.external_replication_delay...ok.
> out of 1 ok:1 failed:0 timeout:0
>
>
>
> On Tue, Dec 23, 2025 at 10:46 AM Tatsuo Ishii <[email protected]> wrote:
>
>> > Hi Tatsuo,
>> >
>> > I'km running into issues testing this and have created a full docker
>> > compose setup - can you please point me to up to date guides on the best
>> > way to run the tests so i know we're doing it the same way?
>> >
>> > Thank you for all your help!
>>
>> I have run the regression test on the Pgpool-II master branch on my
>> Ubuntu 24 box.
>>
>> cd pgpool2/src/test/regression
>> ./regress.sh 041
>>
>> This time I noticed:
>>
>> - The patch does not named with version number
>> - The patch creates .dockerignore and docker/ directory.
>>
>> Are they intended? I am asking because they are different from the
>> previous version.
>>
>> > On Tue, Dec 23, 2025 at 2:13 AM Tatsuo Ishii <[email protected]>
>> wrote:
>> >
>> >> > I think everything is passing now. new version attached.
>> >>
>> >> Unfortunately Test1 did not pass.
>> >>
>> >> === Test1: Basic external command with integer millisecond values ===
>> >> waiting for server to start....1438600 2025-12-23 09:09:48.337 JST LOG:
>> >> redirecting log output to logging collector process
>> >> 1438600 2025-12-23 09:09:48.337 JST HINT: Future log output will appear
>> >> in directory "log".
>> >> done
>> >> server started
>> >> waiting for server to start....1438617 2025-12-23 09:09:48.443 JST LOG:
>> >> redirecting log output to logging collector process
>> >> 1438617 2025-12-23 09:09:48.443 JST HINT: Future log output will appear
>> >> in directory "log".
>> >> done
>> >> server started
>> >> waiting for server to start....1438634 2025-12-23 09:09:48.561 JST LOG:
>> >> redirecting log output to logging collector process
>> >> 1438634 2025-12-23 09:09:48.561 JST HINT: Future log output will appear
>> >> in directory "log".
>> >> done
>> >> server started
>> >> CREATE TABLE
>> >> Waiting for sr_check to run...
>> >> Command executed after 1 seconds
>> >> node_id | hostname | port | status | pg_status | lb_weight | role
>> |
>> >> pg_role | select_cnt | load_balance_node | replication_delay |
>> >> replication_state | replication_sync_state | last_status_change
>> >>
>> >>
>> ---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
>> >> 0 | localhost | 11002 | up | up | 0.333333 | primary
>> |
>> >> primary | 0 | true | 0 |
>> >> | | 2025-12-23 09:09:49
>> >> 1 | localhost | 11003 | up | up | 0.333333 | standby
>> |
>> >> standby | 0 | false | 0 |
>> >> | | 2025-12-23 09:09:49
>> >> 2 | localhost | 11004 | up | up | 0.333333 | standby
>> |
>> >> standby | 0 | false | 0 |
>> >> | | 2025-12-23 09:09:49
>> >> (3 rows)
>> >>
>> >> fail: external command delay logging not found
>> >>
>> >> > On Mon, Nov 24, 2025 at 9:41 AM Tatsuo Ishii <[email protected]>
>> >> wrote:
>> >> >
>> >> >> Thank you for updating the patch! This time the patch applies without
>> >> >> any issue and compiles fine. Unfortunately regression test failed.
>> >> >>
>> >> >> testing 041.external_replication_delay...failed.
>> >> >>
>> >> >> From the regression log, it seems Test7 failed.
>> >> >>
>> >> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> >> === Test7: Command timeout handling ===
>> >> >> waiting for server to start....411181 2025-11-24 16:31:05.244 JST
>> LOG:
>> >> >> redirecting log output to logging collector process
>> >> >> 411181 2025-11-24 16:31:05.244 JST HINT: Future log output will
>> appear
>> >> in
>> >> >> directory "log".
>> >> >> done
>> >> >> server started
>> >> >> waiting for server to start....411196 2025-11-24 16:31:05.352 JST
>> LOG:
>> >> >> redirecting log output to logging collector process
>> >> >> 411196 2025-11-24 16:31:05.352 JST HINT: Future log output will
>> appear
>> >> in
>> >> >> directory "log".
>> >> >> done
>> >> >> server started
>> >> >> waiting for server to start....411213 2025-11-24 16:31:05.461 JST
>> LOG:
>> >> >> redirecting log output to logging collector process
>> >> >> 411213 2025-11-24 16:31:05.461 JST HINT: Future log output will
>> appear
>> >> in
>> >> >> directory "log".
>> >> >> done
>> >> >> server started
>> >> >> Waiting for command timeout...
>> >> >> fail: command timeout not detected
>> >> >>
>> >> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> >>
>> >> >> Attached is the pgpool.log. If you need more info, please let me
>> know.
>> >> >>
>> >> >> Best regards,
>> >> >> --
>> >> >> Tatsuo Ishii
>> >> >> SRA OSS K.K.
>> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> Japanese:http://www.sraoss.co.jp
>> >> >>
>> >> >>
>> >> >> > Hi Tatsuo,
>> >> >> >
>> >> >> > Sorry again, this was due to the separation of 2 patches and i only
>> >> sent
>> >> >> > the one.
>> >> >> >
>> >> >> > I've merged it into 1 commit and 1 patch and rebased over master to
>> >> avoid
>> >> >> > these issues moving forward.
>> >> >> >
>> >> >> > PFA latest version
>> >> >> >
>> >> >> > On Thu, Nov 20, 2025 at 1:09 AM Tatsuo Ishii <[email protected]
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Hi Nadav,
>> >> >> >>
>> >> >> >> Thank you for new patch.
>> >> >> >> Unfortunately the patch did not apply to current master.
>> >> >> >>
>> >> >> >> $ git apply
>> >> >> >> ~/0001-Fix-multiple-issues-in-external-replication-delay-fe.patch
>> >> >> >> error: patch failed:
>> >> src/streaming_replication/pool_worker_child.c:694
>> >> >> >> error: src/streaming_replication/pool_worker_child.c: patch does
>> not
>> >> >> apply
>> >> >> >>
>> >> >> >> Maybe the patch is on top of your previous patch?
>> >> >> >>
>> >> >> >> Also I suggest to use "-v" option of "git format-patch" to add the
>> >> >> >> patch version number so that we can easily know which patch is the
>> >> >> >> latest.
>> >> >> >>
>> >> >> >> Best regards,
>> >> >> >> --
>> >> >> >> Tatsuo Ishii
>> >> >> >> SRA OSS K.K.
>> >> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >>
>> >> >> >> > Hi Tatsuo,
>> >> >> >> >
>> >> >> >> > Please see attached an updated version.
>> >> >> >> >
>> >> >> >> > thank you
>> >> >> >> >
>> >> >> >> > On Fri, Nov 7, 2025 at 2:07 AM Tatsuo Ishii <
>> [email protected]>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> >> > Sorry for that - thanks for the patch.
>> >> >> >> >> >
>> >> >> >> >> > Please find attached a new version
>> >> >> >> >>
>> >> >> >> >> Thanks for the new version. Unfortunately this time regression
>> >> test
>> >> >> >> >> fails at:
>> >> >> >> >>
>> >> >> >> >> > Waiting for command timeout...
>> >> >> >> >> > fail: command timeout not detected
>> >> >> >> >>
>> >> >> >> >> Attached is the pgpool.log.
>> >> >> >> >>
>> >> >> >> >> Best regards,
>> >> >> >> >> --
>> >> >> >> >> Tatsuo Ishii
>> >> >> >> >> SRA OSS K.K.
>> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >> >>
>> >> >> >> >> > On Mon, Nov 3, 2025 at 9:05 AM Tatsuo Ishii <
>> >> [email protected]>
>> >> >> >> >> wrote:
>> >> >> >> >> >
>> >> >> >> >> >> > thanks and sorry for the issues, please find attached
>> updated
>> >> >> >> version.
>> >> >> >> >> >>
>> >> >> >> >> >> No problem.
>> >> >> >> >> >>
>> >> >> >> >> >> This time the patch applies fine, no compiler warnings.
>> >> However,
>> >> >> >> >> >> regression test did not passed here (on Ubuntu 24 LTS if
>> this
>> >> >> >> >> >> matters). So I looked into
>> >> >> >> >> >>
>> >> src/test/regression/tests/041.external_replication_delay/test.sh a
>> >> >> >> >> >> little bit and apply attached patch (test.sh.patch). It
>> moved
>> >> >> forward
>> >> >> >> >> >> partially but failed at:
>> >> >> >> >> >>
>> >> >> >> >> >> fail: command execution failure not detected
>> >> >> >> >> >>
>> >> >> >> >> >> Please find attached
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> src/test/regression/tests/041.external_replication_delay/testdir/pgpool.log
>> >> >> >> >> >> and src/test/regression/log/041.external_replication_delay.
>> >> >> >> >> >>
>> >> >> >> >> >> Best regards,
>> >> >> >> >> >> --
>> >> >> >> >> >> Tatsuo Ishii
>> >> >> >> >> >> SRA OSS K.K.
>> >> >> >> >> >> English: http://www.sraoss.co.jp/index_en/
>> >> >> >> >> >> Japanese:http://www.sraoss.co.jp
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > --
>> >> >> >> >> > Nadav Shatz
>> >> >> >> >> > Tailor Brands | CTO
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Nadav Shatz
>> >> >> >> > Tailor Brands | CTO
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Nadav Shatz
>> >> >> > Tailor Brands | CTO
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Nadav Shatz
>> >> > Tailor Brands | CTO
>> >>
>> >
>> >
>> > --
>> > Nadav Shatz
>> > Tailor Brands | CTO
>>
>
>
> --
> Nadav Shatz
> Tailor Brands | CTO
Attachments:
[application/octet-stream] extenal_delay_cmd.patch (8.2K, 2-extenal_delay_cmd.patch)
download | inline diff:
diff --git a/src/streaming_replication/pool_worker_child.c b/src/streaming_replication/pool_worker_child.c
index 457d0fab0..c509ba5bc 100644
--- a/src/streaming_replication/pool_worker_child.c
+++ b/src/streaming_replication/pool_worker_child.c
@@ -132,7 +132,7 @@ do_worker_child(void *params)
signal(SIGINT, my_signal_handler);
signal(SIGHUP, reload_config_handler);
signal(SIGQUIT, my_signal_handler);
- signal(SIGCHLD, SIG_IGN);
+ signal(SIGCHLD, my_signal_handler);
signal(SIGUSR1, my_signal_handler);
signal(SIGUSR2, SIG_IGN);
signal(SIGPIPE, SIG_IGN);
@@ -262,16 +262,20 @@ do_worker_child(void *params)
POOL_NODE_STATUS *node_status;
int i;
- /* Do replication time lag checking */
- /* Use external command if replication_delay_source_cmd is configured */
- if (pool_config->replication_delay_source_cmd &&
- strlen(pool_config->replication_delay_source_cmd) > 0)
- check_replication_time_lag_with_cmd();
- else
- check_replication_time_lag();
+ /* Do replication time lag checking */
- /* Check node status */
- node_status = verify_backend_node_status(slots);
+ /*
+ * Use external command if replication_delay_source_cmd is
+ * configured
+ */
+ if (pool_config->replication_delay_source_cmd &&
+ strlen(pool_config->replication_delay_source_cmd) > 0)
+ check_replication_time_lag_with_cmd();
+ else
+ check_replication_time_lag();
+
+ /* Check node status */
+ node_status = verify_backend_node_status(slots);
for (i = 0; i < NUM_BACKENDS; i++)
@@ -668,7 +672,7 @@ check_replication_time_lag(void)
}
#define MAX_CMD_OUTPUT 4096
-#define MAX_REASONABLE_DELAY_MS 3600000.0 /* 1 hour in milliseconds */
+#define MAX_REASONABLE_DELAY_MS 3600000.0 /* 1 hour in milliseconds */
/*
* Check replication time lag using external command
@@ -680,23 +684,23 @@ check_replication_time_lag(void)
static void
check_replication_time_lag_with_cmd(void)
{
- char *command = NULL;
- char *line;
- char *token;
- char *saveptr;
- double delay_ms;
- uint64 delay;
- int token_count = 0;
- BackendInfo *bkinfo;
+ char *command = NULL;
+ char *line;
+ char *token;
+ char *saveptr;
+ double delay_ms;
+ uint64 delay;
+ int token_count = 0;
+ BackendInfo *bkinfo;
ErrorContextCallback callback;
- int pipefd[2] = {-1, -1};
- pid_t pid = -1;
- int ret;
- struct timeval timeout;
- fd_set readfds;
- ssize_t bytes_read;
- int status;
- int num_replicas;
+ int pipefd[2] = {-1, -1};
+ pid_t pid = -1;
+ int ret;
+ struct timeval timeout;
+ fd_set readfds;
+ ssize_t bytes_read;
+ int status;
+ int num_replicas;
if (NUM_BACKENDS <= 1)
{
@@ -717,7 +721,7 @@ check_replication_time_lag_with_cmd(void)
}
/* Capture primary node ID to avoid race conditions during execution */
- int primary_node_id = REAL_PRIMARY_NODE_ID;
+ int primary_node_id = REAL_PRIMARY_NODE_ID;
if (!pool_config->replication_delay_source_cmd ||
strlen(pool_config->replication_delay_source_cmd) == 0)
@@ -746,16 +750,21 @@ check_replication_time_lag_with_cmd(void)
PG_TRY();
{
const char *base_command = pool_config->replication_delay_source_cmd;
- size_t total_len = strlen(base_command) + 1; /* +1 for NUL */
+ size_t total_len = strlen(base_command) + 1; /* +1 for NUL */
/* Build command with replica-only arguments (omit primary) */
- /* Calculate total command length including space-separated replica identifiers */
+
+ /*
+ * Calculate total command length including space-separated replica
+ * identifiers
+ */
for (int i = 0; i < NUM_BACKENDS; i++)
{
if (i == primary_node_id)
- continue; /* Skip primary node */
+ continue; /* Skip primary node */
+
+ char *ident = build_instance_identifier_for_node(i);
- char *ident = build_instance_identifier_for_node(i);
total_len += 1 /* space */ + strlen(ident);
pfree(ident);
}
@@ -764,13 +773,14 @@ check_replication_time_lag_with_cmd(void)
strlcpy(command, base_command, total_len);
/* Append replica identifiers */
- size_t current_len = strlen(command);
+ size_t current_len = strlen(command);
+
for (int i = 0; i < NUM_BACKENDS; i++)
{
if (i == primary_node_id)
- continue; /* Skip primary node */
+ continue; /* Skip primary node */
- char *ident = build_instance_identifier_for_node(i);
+ char *ident = build_instance_identifier_for_node(i);
/* Append space and identifier */
snprintf(command + current_len, total_len - current_len, " %s", ident);
@@ -800,16 +810,16 @@ check_replication_time_lag_with_cmd(void)
if (pid == 0)
{
/* Child process */
- close(pipefd[0]); /* Close read end */
+ close(pipefd[0]); /* Close read end */
if (dup2(pipefd[1], STDOUT_FILENO) == -1)
{
fprintf(stderr, "dup2 failed: %s\n", strerror(errno));
exit(1);
}
- close(pipefd[1]); /* Close write end (duplicated to stdout) */
+ close(pipefd[1]); /* Close write end (duplicated to stdout) */
/* Execute command using shell */
- execl("/bin/sh", "sh", "-c", command, (char *)NULL);
+ execl("/bin/sh", "sh", "-c", command, (char *) NULL);
/* If execl fails */
fprintf(stderr, "execl failed: %s\n", strerror(errno));
@@ -817,7 +827,7 @@ check_replication_time_lag_with_cmd(void)
}
/* Parent process */
- close(pipefd[1]); /* Close write end */
+ close(pipefd[1]); /* Close write end */
pipefd[1] = -1;
/* Set up timeout for select */
@@ -832,7 +842,8 @@ check_replication_time_lag_with_cmd(void)
if (ret == -1)
{
- int save_errno = errno;
+ int save_errno = errno;
+
kill(pid, SIGKILL);
waitpid(pid, NULL, 0);
pid = -1;
@@ -913,11 +924,12 @@ check_replication_time_lag_with_cmd(void)
bkinfo->standby_delay_by_time = true;
/* Count expected replicas */
- num_replicas = NUM_BACKENDS - 1; /* Total nodes minus primary */
+ num_replicas = NUM_BACKENDS - 1; /* Total nodes minus primary */
/* Count tokens in output for validation */
- char *line_copy = pstrdup(line);
- char *temp_token = strtok(line_copy, " \t\n");
+ char *line_copy = pstrdup(line);
+ char *temp_token = strtok(line_copy, " \t\n");
+
while (temp_token != NULL)
{
token_count++;
@@ -953,7 +965,7 @@ check_replication_time_lag_with_cmd(void)
for (int i = 0; i < NUM_BACKENDS && token != NULL; i++)
{
if (i == primary_node_id)
- continue; /* Skip primary - it's not in the output */
+ continue; /* Skip primary - it's not in the output */
if (!VALID_BACKEND(i))
{
@@ -962,7 +974,8 @@ check_replication_time_lag_with_cmd(void)
continue;
}
- char *endptr;
+ char *endptr;
+
delay_ms = strtod(token, &endptr);
/* Validate the conversion */
@@ -1002,13 +1015,18 @@ check_replication_time_lag_with_cmd(void)
delay_ms, i)));
}
- /* Convert delay from milliseconds to microseconds for internal storage */
- delay = (uint64)(delay_ms * 1000);
+ /*
+ * Convert delay from milliseconds to microseconds for internal
+ * storage
+ */
+ delay = (uint64) (delay_ms * 1000);
bkinfo->standby_delay = delay;
bkinfo->standby_delay_by_time = true;
/* Log delay if necessary */
- uint64 delay_threshold_by_time = pool_config->delay_threshold_by_time * 1000; /* threshold is in milliseconds, convert to microseconds */
+ uint64 delay_threshold_by_time = pool_config->delay_threshold_by_time * 1000; /* threshold is in
+ * milliseconds, convert
+ * to microseconds */
if ((pool_config->log_standby_delay == LSD_ALWAYS && delay_ms > 0) ||
(pool_config->log_standby_delay == LSD_OVER_THRESHOLD &&
@@ -1026,12 +1044,15 @@ check_replication_time_lag_with_cmd(void)
PG_CATCH();
{
/* Cleanup in case of error */
- if (pid > 0) {
+ if (pid > 0)
+ {
kill(pid, SIGKILL);
waitpid(pid, NULL, 0);
}
- if (pipefd[0] != -1) close(pipefd[0]);
- if (pipefd[1] != -1) close(pipefd[1]);
+ if (pipefd[0] != -1)
+ close(pipefd[0]);
+ if (pipefd[1] != -1)
+ close(pipefd[1]);
if (line)
pfree(line);
@@ -1137,6 +1158,9 @@ static RETSIGTYPE my_signal_handler(int sig)
restart_request = 1;
break;
+ case SIGCHLD:
+ break;
+
default:
exit(1);
break;
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Proposal: recent access based routing for primary-replica setups
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox