public inbox for [email protected]  
help / color / mirror / Atom feed
From: Xuneng Zhou <[email protected]>
To: Fujii Masao <[email protected]>
To: Amit Kapila <[email protected]>
Cc: Zhijie Hou (Fujitsu) <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Fri, 12 Jun 2026 10:52:01 +0800
Message-ID: <CABPTF7WBh_mKi60EYLiueaZ_cdJvnrOrpSt3hQkuZ_uY4w5duA@mail.gmail.com> (raw)
In-Reply-To: <CAHGQGwG_3ff4HciHtTZ_uMvbJgSDWsz4Yawj_zQpDG6Yj=Mjng@mail.gmail.com>
References: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com>
	<TY4PR01MB177180A7CE60BCDF286B1C6F594172@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com>
	<CAHGQGwE+2WSqiAYgNJRkf_twdB+uRGozjjGhUn76vUKZ8dzbSA@mail.gmail.com>
	<CABPTF7VeA8szPv7LYDVY9_7LftV-HM8NFVQR2natPKmr73JW+A@mail.gmail.com>
	<TY4PR01MB1771887D33612C5A45F7E9CDF941E2@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAA4eK1LqFBKCkX2eoX3iQPxJJnzWTaCpdh9zNotxuoG8BgjdtA@mail.gmail.com>
	<CAA4eK1LkRdbm5XA=qa82Rp_y4rnyJh8pypMWVqOezOZpzy=Oaw@mail.gmail.com>
	<CAHGQGwG_3ff4HciHtTZ_uMvbJgSDWsz4Yawj_zQpDG6Yj=Mjng@mail.gmail.com>

Hi Fujii-san, Amit,

The issues look real to me and could be dealt with patch v1 partially.

On Thu, Jun 11, 2026 at 9:19 PM Fujii Masao <[email protected]> wrote:
>
> On Thu, Jun 11, 2026 at 8:18 PM Amit Kapila <[email protected]> wrote:
> > 1. Stale name read in local_sync_slot_required(): The reused cell
> > holds a different name. local_sync_slot_required() might return false
> > (drop needed). But then the in_use && synced spinlock check sees
> > synced = false and skips the actual drop. The wrong decision is
> > caught.
>
> Yes, we could skip the actual drop. But then wouldn't we still emit
> the log message "dropped replication slot ..." even though no slot was
> actually dropped?

With v1, we won't emit the log message unless the log is factually
dropped. However it did not prevent the stale read in
local_sync_slot_required().

> > 2. Wrong database OID read at line 551: The reused cell holds OID_B
> > from the new slot. We lock OID_B, then at lines 563–565 we see synced
> > = false, skip the drop, and unlock OID_B at line 579. Since no drop
> > occurred, the cell is still the same non-synced slot, so the lock and
> > unlock see the same OID_B. Symmetric — no lock leak.
>
> What happens if the slot for OID_B is dropped after we lock
> OID_B, and then a new slot for OID_C reuses the same array entry? In
> that case, wouldn't the later unlock read OID_C from
> local_slot->data.database even though the lock was originally taken on
> OID_B?

V1 stops doing the venerable second read of local_slot->data.database.
So if the copied value was already stale and points to OID_B, v1 is at
least symmetric:

read OID_B once
lock OID_B
cell reused as OID_C
unlock OID_B

But v1 seems not to fully  solve issue 1.

It can still do this:

cell already reused before slot_database is copied
v1 copies OID_B from replacement slot
locks OID_B
recheck sees synced=false
skips drop
unlocks OID_B

That is still a stale read and possibly a wasted/wrong database lock,
but it doesn't leak the lock, unlocks the wrong object, logs a false
drop, or drops the replacement slot.

In an off-list chat with Zhijie, we kinda thought that holding the
lock of a wrong db for a brief time doesn't seem to harm a lot. The
concurrent dropping-db operation leads to this issue seems rare in
practice. He stated that the deletion of the slot seems unavoidable
because we have to acquire the database lock after releasing the
replication slot lock to avoid the deadlock with the startup/drop db
operation. Therefore, he prefered keeping the design simple and
avoiding the fatal issue over doing a broader refactoring work. I
don't have a strong opinion on this. Still attaching the refactoring
patch to do some clean-up in case someone thinks it's worthwhile.


--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Attachments:

  [application/octet-stream] v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch (3.0K, 2-v1-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch)
  download | inline diff:
From e920adcb5c01510f94ee75f26133e0efa6089038 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 2 Jun 2026 13:14:54 +0800
Subject: [PATCH v1] Avoid stale slot access after dropping obsolete synced
 slots

drop_local_obsolete_slots() kept using local_slot after calling
ReplicationSlotDropAcquired().  Once the drop completes, the slot array entry can
be reused by another backend, so later reads of local_slot->data could refer to a
different slot.

Copy the slot name and database OID before dropping the slot, and use those
saved values for unlocking and logging after the drop.
---
 src/backend/replication/logical/slotsync.c | 25 ++++++++++++++--------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 93f41be32af..ea73f0aa262 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -541,6 +541,9 @@ drop_local_obsolete_slots(List *remote_slot_list)
 		/* Drop the local slot if it is not required to be retained. */
 		if (!local_sync_slot_required(local_slot, remote_slot_list))
 		{
+			bool		dropped = false;
+			NameData	slot_name = {0};
+			Oid			slot_database = local_slot->data.database;
 			bool		synced_slot;
 
 			/*
@@ -548,8 +551,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
 			 * ReplicationSlotsDropDBSlots(), trying to drop the same slot
 			 * during a drop-database operation.
 			 */
-			LockSharedObject(DatabaseRelationId, local_slot->data.database,
-							 0, AccessShareLock);
+			LockSharedObject(DatabaseRelationId, slot_database, 0,
+							 AccessShareLock);
 
 			/*
 			 * In the small window between getting the slot to drop and
@@ -562,6 +565,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
 			 */
 			SpinLockAcquire(&local_slot->mutex);
 			synced_slot = local_slot->in_use && local_slot->data.synced;
+			if (synced_slot)
+				slot_name = local_slot->data.name;
 			SpinLockRelease(&local_slot->mutex);
 
 			if (synced_slot)
@@ -572,17 +577,19 @@ drop_local_obsolete_slots(List *remote_slot_list)
 				 * a standby, which derives its logical decoding state from
 				 * the primary, it would be wrong to do so.
 				 */
-				ReplicationSlotAcquire(NameStr(local_slot->data.name), true, false);
+				ReplicationSlotAcquire(NameStr(slot_name), true, false);
 				ReplicationSlotDropAcquired(false);
+				dropped = true;
 			}
 
-			UnlockSharedObject(DatabaseRelationId, local_slot->data.database,
-							   0, AccessShareLock);
+			UnlockSharedObject(DatabaseRelationId, slot_database, 0,
+							   AccessShareLock);
 
-			ereport(LOG,
-					errmsg("dropped replication slot \"%s\" of database with OID %u",
-						   NameStr(local_slot->data.name),
-						   local_slot->data.database));
+			if (dropped)
+				ereport(LOG,
+						errmsg("dropped replication slot \"%s\" of database with OID %u",
+							   NameStr(slot_name),
+							   slot_database));
 		}
 	}
 }
-- 
2.51.0



  [application/octet-stream] v1-0002-Avoid-stale-slot-pointers-in-slotsync-cleanup.patch (10.5K, 3-v1-0002-Avoid-stale-slot-pointers-in-slotsync-cleanup.patch)
  download | inline diff:
From c741e5c1205efedcc6841c85ab0539832c658578 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 12 Jun 2026 10:27:04 +0800
Subject: [PATCH v1] Avoid stale slot pointers in slotsync cleanup

drop_local_obsolete_slots() kept raw ReplicationSlot * values after
scanning the shared slot array. Once ReplicationSlotControlLock was
released, those array entries could be dropped and reused before the
later retention check or database-lock/drop path.

Copy the local synced slot identity while scanning the array, and carry
those copied values instead of raw slot pointers. Revalidate the copied
identity before acquiring and dropping the slot, and use the copied
name and database OID for acquire, unlock, and logging.

This makes the cleanup path avoid depending on slot array entries
remaining stable across unlocked windows.
---
 src/backend/replication/logical/slotsync.c | 205 +++++++++++++++++----
 src/tools/pgindent/typedefs.list           |   1 +
 2 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index ea73f0aa262..a323df452fd 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -178,6 +178,20 @@ typedef struct RemoteSlot
 	ReplicationSlotInvalidationCause invalidated;
 } RemoteSlot;
 
+/*
+ * Copied identity of a local synchronized slot.
+ *
+ * The slot number is a cached array position used to avoid a later name scan.
+ * It is not sufficient as identity by itself because slot array entries can be
+ * reused after a slot is dropped.
+ */
+typedef struct LocalSyncedSlot
+{
+	NameData	name;
+	Oid			database;
+	int			slotno;
+} LocalSyncedSlot;
+
 static void slotsync_failure_callback(int code, Datum arg);
 static void update_synced_slots_inactive_since(void);
 
@@ -444,7 +458,7 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
 }
 
 /*
- * Get the list of local logical slots that are synchronized from the
+ * Get copied identities of local logical slots that are synchronized from the
  * primary server.
  */
 static List *
@@ -459,10 +473,29 @@ get_local_synced_slots(void)
 		ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
 
 		/* Check if it is a synchronized slot */
-		if (s->in_use && s->data.synced)
+		if (s->in_use)
 		{
-			Assert(SlotIsLogical(s));
-			local_slots = lappend(local_slots, s);
+			LocalSyncedSlot local_slot;
+			bool		synced;
+
+			SpinLockAcquire(&s->mutex);
+			synced = s->data.synced;
+			if (synced)
+			{
+				local_slot.name = s->data.name;
+				local_slot.database = s->data.database;
+				local_slot.slotno = i;
+			}
+			SpinLockRelease(&s->mutex);
+
+			if (synced)
+			{
+				LocalSyncedSlot *slot = palloc(sizeof(LocalSyncedSlot));
+
+				Assert(local_slot.database != InvalidOid);
+				*slot = local_slot;
+				local_slots = lappend(local_slots, slot);
+			}
 		}
 	}
 
@@ -471,34 +504,125 @@ get_local_synced_slots(void)
 	return local_slots;
 }
 
+/*
+ * Check whether the previously observed slot array cell still contains a
+ * synchronized logical slot matching the copied identity. If requested, copy
+ * the current invalidation cause.
+ */
+static bool
+local_synced_slot_matches(const LocalSyncedSlot *local_slot,
+						  ReplicationSlotInvalidationCause *invalidated)
+{
+	ReplicationSlot *slot;
+	bool		matches = false;
+
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+
+	Assert(local_slot->slotno >= 0);
+	Assert(local_slot->slotno <
+		   max_replication_slots + max_repack_replication_slots);
+
+	slot = &ReplicationSlotCtl->replication_slots[local_slot->slotno];
+	if (slot->in_use)
+	{
+		NameData	slot_name;
+		Oid			slot_database;
+		bool		slot_synced;
+		ReplicationSlotInvalidationCause slot_invalidated;
+
+		SpinLockAcquire(&slot->mutex);
+		slot_name = slot->data.name;
+		slot_database = slot->data.database;
+		slot_synced = slot->data.synced;
+		slot_invalidated = slot->data.invalidated;
+		SpinLockRelease(&slot->mutex);
+
+		/*
+		 * A synced slot's database can be updated by slot synchronization, so
+		 * database is not globally immutable. It is still part of the copied
+		 * identity here because slot synchronization is serialized by
+		 * SlotSyncCtx->syncing.
+		 */
+		matches = slot_database != InvalidOid &&
+			slot_synced &&
+			strcmp(NameStr(slot_name), NameStr(local_slot->name)) == 0 &&
+			slot_database == local_slot->database;
+
+		if (matches && invalidated)
+			*invalidated = slot_invalidated;
+	}
+
+	LWLockRelease(ReplicationSlotControlLock);
+
+	return matches;
+}
+
+/*
+ * Check whether the acquired slot still matches a previously copied local slot
+ * identity.
+ */
+static bool
+acquired_slot_matches(const LocalSyncedSlot *local_slot)
+{
+	ReplicationSlot *slot = MyReplicationSlot;
+	NameData	slot_name;
+	Oid			slot_database;
+	bool		slot_synced;
+	int			slotno;
+
+	Assert(slot != NULL);
+
+	/*
+	 * We own MyReplicationSlot, so slot.h allows reading its fields without
+	 * taking the slot mutex.
+	 */
+	slot_name = slot->data.name;
+	slot_database = slot->data.database;
+	slot_synced = slot->data.synced;
+	slotno = ReplicationSlotIndex(slot);
+
+	return slotno == local_slot->slotno &&
+		slot_database != InvalidOid &&
+		slot_synced &&
+		strcmp(NameStr(slot_name), NameStr(local_slot->name)) == 0 &&
+		slot_database == local_slot->database;
+}
+
 /*
  * Helper function to check if local_slot is required to be retained.
  *
  * Return false either if local_slot does not exist in the remote_slots list
- * or is invalidated while the corresponding remote slot is still valid,
- * otherwise true.
+ * or is invalidated while the corresponding remote slot is still valid, or if
+ * the copied local slot no longer matches the current slot array entry.
+ * Otherwise, return true.
  */
 static bool
-local_sync_slot_required(ReplicationSlot *local_slot, List *remote_slots)
+local_sync_slot_required(const LocalSyncedSlot *local_slot, List *remote_slots)
 {
 	bool		remote_exists = false;
 	bool		locally_invalidated = false;
 
 	foreach_ptr(RemoteSlot, remote_slot, remote_slots)
 	{
-		if (strcmp(remote_slot->name, NameStr(local_slot->data.name)) == 0)
+		if (strcmp(remote_slot->name, NameStr(local_slot->name)) == 0)
 		{
 			remote_exists = true;
 
 			/*
 			 * If remote slot is not invalidated but local slot is marked as
-			 * invalidated, then set locally_invalidated flag.
+			 * invalidated, then set locally_invalidated flag. When the remote
+			 * slot is already invalidated, keep the local slot as before and
+			 * avoid an unnecessary revalidation.
 			 */
-			SpinLockAcquire(&local_slot->mutex);
-			locally_invalidated =
-				(remote_slot->invalidated == RS_INVAL_NONE) &&
-				(local_slot->data.invalidated != RS_INVAL_NONE);
-			SpinLockRelease(&local_slot->mutex);
+			if (remote_slot->invalidated == RS_INVAL_NONE)
+			{
+				ReplicationSlotInvalidationCause invalidated;
+
+				if (!local_synced_slot_matches(local_slot, &invalidated))
+					return false;
+
+				locally_invalidated = invalidated != RS_INVAL_NONE;
+			}
 
 			break;
 		}
@@ -536,40 +660,31 @@ drop_local_obsolete_slots(List *remote_slot_list)
 {
 	List	   *local_slots = get_local_synced_slots();
 
-	foreach_ptr(ReplicationSlot, local_slot, local_slots)
+	foreach_ptr(LocalSyncedSlot, local_slot, local_slots)
 	{
 		/* Drop the local slot if it is not required to be retained. */
 		if (!local_sync_slot_required(local_slot, remote_slot_list))
 		{
 			bool		dropped = false;
-			NameData	slot_name = {0};
-			Oid			slot_database = local_slot->data.database;
-			bool		synced_slot;
 
 			/*
 			 * Use shared lock to prevent a conflict with
 			 * ReplicationSlotsDropDBSlots(), trying to drop the same slot
 			 * during a drop-database operation.
 			 */
-			LockSharedObject(DatabaseRelationId, slot_database, 0,
+			LockSharedObject(DatabaseRelationId, local_slot->database, 0,
 							 AccessShareLock);
 
 			/*
-			 * In the small window between getting the slot to drop and
-			 * locking the database, there is a possibility of a parallel
-			 * database drop by the startup process and the creation of a new
-			 * slot by the user. This new user-created slot may end up using
-			 * the same shared memory as that of 'local_slot'. Thus check if
-			 * local_slot is still the synced one before performing the actual
-			 * drop.
+			 * In the window between copying the slot identity and locking the
+			 * database, there is a possibility of a parallel database drop by
+			 * the startup process and the creation of a new slot by the user.
+			 * This new user-created slot may end up using the same shared
+			 * memory cell as the copied slot. Thus check whether the cached
+			 * slot position still contains the same synced slot before
+			 * performing the actual drop.
 			 */
-			SpinLockAcquire(&local_slot->mutex);
-			synced_slot = local_slot->in_use && local_slot->data.synced;
-			if (synced_slot)
-				slot_name = local_slot->data.name;
-			SpinLockRelease(&local_slot->mutex);
-
-			if (synced_slot)
+			if (local_synced_slot_matches(local_slot, NULL))
 			{
 				/*
 				 * Now acquire and drop the slot.  Note we purposely don't
@@ -577,21 +692,33 @@ drop_local_obsolete_slots(List *remote_slot_list)
 				 * a standby, which derives its logical decoding state from
 				 * the primary, it would be wrong to do so.
 				 */
-				ReplicationSlotAcquire(NameStr(slot_name), true, false);
-				ReplicationSlotDropAcquired(false);
-				dropped = true;
+				ReplicationSlotAcquire(NameStr(local_slot->name), true, false);
+
+				/*
+				 * Recheck the acquired slot defensively in case the slot
+				 * changed between revalidation and acquisition.
+				 */
+				if (acquired_slot_matches(local_slot))
+				{
+					ReplicationSlotDropAcquired(false);
+					dropped = true;
+				}
+				else
+					ReplicationSlotRelease();
 			}
 
-			UnlockSharedObject(DatabaseRelationId, slot_database, 0,
+			UnlockSharedObject(DatabaseRelationId, local_slot->database, 0,
 							   AccessShareLock);
 
 			if (dropped)
 				ereport(LOG,
 						errmsg("dropped replication slot \"%s\" of database with OID %u",
-							   NameStr(slot_name),
-							   slot_database));
+							   NameStr(local_slot->name),
+							   local_slot->database));
 		}
 	}
+
+	list_free_deep(local_slots);
 }
 
 /*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8cf40c87043..663383b0531 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1646,6 +1646,7 @@ LoInfo
 LoadStmt
 LocalBufferLookupEnt
 LocalPgBackendStatus
+LocalSyncedSlot
 LocalTransactionId
 Location
 LocationIndex
-- 
2.51.0



view thread (27+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
  In-Reply-To: <CABPTF7WBh_mKi60EYLiueaZ_cdJvnrOrpSt3hQkuZ_uY4w5duA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox