public inbox for [email protected]
help / color / mirror / Atom feedFrom: Fujii Masao <[email protected]>
To: PostgreSQL Hackers <[email protected]>
Subject: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Date: Thu, 19 Mar 2026 01:05:29 +0900
Message-ID: <CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com> (raw)
Hi,
I noticed that during standby promotion the startup process sends SIGUSR1 to
the slotsync worker to make it exit. Is there a reason for using SIGUSR1?
If the slotsync worker is blocked waiting for input from the primary (e.g.,
due to a network outage between the primary and standby), SIGUSR1 won't
interrupt the wait. As a result, the worker can remain stuck and delay
promotion for a long time.
Would it make sense to send SIGTERM instead, so the worker can exit promptly
even while waiting? I've attached a WIP patch that does this. I haven't updated
the source comments yet, but I can do so if we agree on the approach.
SIGTERM alone is not sufficient, though. A new slotsync worker could start
immediately after the old one exits and block promotion again. To address this,
the patch makes a newly started worker exit immediately if promotion is
in progress.
Thoughts?
Regards,
--
Fujii Masao
Attachments:
[application/octet-stream] v1-0001-Use-SIGTERM-to-stop-slotsync-worker-during-standb.patch (2.7K, 2-v1-0001-Use-SIGTERM-to-stop-slotsync-worker-during-standb.patch)
download | inline diff:
From cdcce240fcabf3c4f91ad6931b451bc77db76caf Mon Sep 17 00:00:00 2001
From: Fujii Masao <[email protected]>
Date: Thu, 19 Mar 2026 00:50:07 +0900
Subject: [PATCH v1] Use SIGTERM to stop slotsync worker during standby
promotion
Previously, when standby promotion was requested, the startup process sent
SIGUSR1 to the slotsync worker (or a backend performing slot synchronization)
to make it exit. This generally worked, but if the slotsync worker was blocked
waiting for input from the primary, SIGUSR1 would not interrupt the wait.
As a result, the worker could remain stuck, preventing promotion from
completing for a long time.
This commit fixes the issue by having the startup process send SIGTERM
instead of SIGUSR1, allowing the slotsync worker (or backend) to exit promptly
even while waiting for input.
Additionally, new slotsync worker could launch immediately after the old one
was terminated, which could again block promotion. To prevent this, a slotsync
worker that starts up during promotion now detects this condition and exits
immediately.
This ensures that standby promotion is not delayed by stuck or newly started
slotsync workers.
---
src/backend/replication/logical/slotsync.c | 29 ++++++----------------
1 file changed, 7 insertions(+), 22 deletions(-)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index e75db69e3f6..3d18ff4a66e 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1298,27 +1298,6 @@ ProcessSlotSyncInterrupts(void)
{
CHECK_FOR_INTERRUPTS();
- if (SlotSyncCtx->stopSignaled)
- {
- if (AmLogicalSlotSyncWorkerProcess())
- {
- ereport(LOG,
- errmsg("replication slot synchronization worker will stop because promotion is triggered"));
-
- proc_exit(0);
- }
- else
- {
- /*
- * For the backend executing SQL function
- * pg_sync_replication_slots().
- */
- ereport(ERROR,
- errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("replication slot synchronization will stop because promotion is triggered"));
- }
- }
-
if (ConfigReloadPending)
slotsync_reread_config();
}
@@ -1427,6 +1406,12 @@ check_and_set_sync_info(pid_t sync_process_pid)
{
SpinLockAcquire(&SlotSyncCtx->mutex);
+ if (SlotSyncCtx->stopSignaled)
+ {
+ SpinLockRelease(&SlotSyncCtx->mutex);
+ proc_exit(0);
+ }
+
if (SlotSyncCtx->syncing)
{
SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1752,7 +1737,7 @@ ShutDownSlotSync(void)
* detecting that the stopSignaled flag is set to true.
*/
if (sync_process_pid != InvalidPid)
- kill(sync_process_pid, SIGUSR1);
+ kill(sync_process_pid, SIGTERM);
/* Wait for slot sync to end */
for (;;)
--
2.51.2
view thread (42+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
In-Reply-To: <CAHGQGwFzNYroAxSoyJhqTU-pH=t4Ej6RyvhVmBZ91Exj_TPMMQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox