public inbox for [email protected]
help / color / mirror / Atom feedFrom: Zhijie Hou (Fujitsu) <[email protected]>
To: PostgreSQL Hackers <[email protected]>
Subject: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Wed, 27 May 2026 11:50:16 +0000
Message-ID: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com> (raw)
Hi,
While testing the slot release logic, I noticed a bug in
ReplicationSlotRelease() where it may access a replication slot array entry that
has already been released by itself.
The detail is: When releasing an ephemeral replication slot,
ReplicationSlotRelease() first drops the slot via ReplicationSlotDropAcquired().
After this point, the slot's shared memory slot array entry can be immediately
reused by another backend creating a new slot.
However, ReplicationSlotRelease() continued executing common cleanup code that
still dereferenced the old slot pointer and updated shared memory fields such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.
I am attaching a patch that avoids touching slot shared-memory state after
dropping an ephemeral slot. Keep the post-release shared-memory updates only for
non-ephemeral slots, where the slot remains valid after release.
To reproduce, we can use the following steps:
1. Attach gdb to the backend and set a breakpoint in ReplicationSlotRelease()
right after ReplicationSlotDropAcquired() is called.
2. Create an ephemeral slot in the above backend with an invalid output plugin:
SELECT pg_create_logical_replication_slot('test_slot_dropped', 'pgoutput2', false, false, true);
3. Once the breakpoint is hit, start another backend and create a new slot
named 'test_slot_created'.
4. Release the breakpoint and allow the first backend to continue. At this
point, you will see it updating the new slot 'test_slot_created' -> active_proc
(and effective_xmin, if a snapshot is being exported) to invalid values.
5. Start a third backend and attempt to acquire the same slot
'test_slot_created' ? this should not be possible under normal circumstances,
but the bug allows it.
I haven't attached a test for this fix, as the change is straightforward and the
likelihood of encountering this bug is low, so it may not be worth adding test
cycles for it. However, if others feel differently, I'm OK to add one.
Best Regards,
Hou zj
Attachments:
[application/octet-stream] v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.patch (3.9K, 2-v1-0001-Fix-race-in-ReplicationSlotRelease-for-ephemeral-.patch)
download | inline diff:
From d58d49e585abf4f1c2cc29de172dcb33595017d7 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 27 May 2026 18:24:39 +0800
Subject: [PATCH v1] Fix race in ReplicationSlotRelease for ephemeral slots
When releasing an ephemeral replication slot, ReplicationSlotRelease() first
drops the slot via ReplicationSlotDropAcquired(). After this point, the slot's
shared memory slot array entry can be immediately reused by another backend
creating a new slot.
However, ReplicationSlotRelease() continued executing common cleanup code that
still dereferenced the old slot pointer and updated shared memory fields such as
effective_xmin. If the slot array entry had already been reallocated, these
writes could inadvertently affect a different, unrelated slot.
This commit avoids touching slot shared-memory state after dropping an ephemeral
slot. Keep the post-release shared-memory updates only for non-ephemeral slots,
where the slot remains valid after release.
---
src/backend/replication/slot.c | 68 +++++++++++++++++-----------------
1 file changed, 35 insertions(+), 33 deletions(-)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index c0c9f514f7b..37745867930 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -797,44 +797,46 @@ ReplicationSlotRelease(void)
if (is_logical)
RequestDisableLogicalDecoding();
}
-
- /*
- * If slot needed to temporarily restrain both data and catalog xmin to
- * create the catalog snapshot, remove that temporary constraint.
- * Snapshots can only be exported while the initial snapshot is still
- * acquired.
- */
- if (!TransactionIdIsValid(slot->data.xmin) &&
- TransactionIdIsValid(slot->effective_xmin))
+ else
{
- SpinLockAcquire(&slot->mutex);
- slot->effective_xmin = InvalidTransactionId;
- SpinLockRelease(&slot->mutex);
- ReplicationSlotsComputeRequiredXmin(false);
- }
-
- /*
- * Set the time since the slot has become inactive. We get the current
- * time beforehand to avoid system call while holding the spinlock.
- */
- now = GetCurrentTimestamp();
+ /*
+ * If slot needed to temporarily restrain both data and catalog xmin
+ * to create the catalog snapshot, remove that temporary constraint.
+ * Snapshots can only be exported while the initial snapshot is still
+ * acquired.
+ */
+ if (!TransactionIdIsValid(slot->data.xmin) &&
+ TransactionIdIsValid(slot->effective_xmin))
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->effective_xmin = InvalidTransactionId;
+ SpinLockRelease(&slot->mutex);
+ ReplicationSlotsComputeRequiredXmin(false);
+ }
- if (slot->data.persistency == RS_PERSISTENT)
- {
/*
- * Mark persistent slot inactive. We're not freeing it, just
- * disconnecting, but wake up others that may be waiting for it.
+ * Set the time since the slot has become inactive. We get the current
+ * time beforehand to avoid system call while holding the spinlock.
*/
- SpinLockAcquire(&slot->mutex);
- slot->active_proc = INVALID_PROC_NUMBER;
- ReplicationSlotSetInactiveSince(slot, now, false);
- SpinLockRelease(&slot->mutex);
- ConditionVariableBroadcast(&slot->active_cv);
- }
- else
- ReplicationSlotSetInactiveSince(slot, now, true);
+ now = GetCurrentTimestamp();
- MyReplicationSlot = NULL;
+ if (slot->data.persistency == RS_PERSISTENT)
+ {
+ /*
+ * Mark persistent slot inactive. We're not freeing it, just
+ * disconnecting, but wake up others that may be waiting for it.
+ */
+ SpinLockAcquire(&slot->mutex);
+ slot->active_proc = INVALID_PROC_NUMBER;
+ ReplicationSlotSetInactiveSince(slot, now, false);
+ SpinLockRelease(&slot->mutex);
+ ConditionVariableBroadcast(&slot->active_cv);
+ }
+ else
+ ReplicationSlotSetInactiveSince(slot, now, true);
+
+ MyReplicationSlot = NULL;
+ }
/* might not have been set when we've been a plain slot */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
--
2.43.0
view thread (27+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
In-Reply-To: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox