public inbox for [email protected]  
help / color / mirror / Atom feed
From: Srinath Reddy Sadipiralla <[email protected]>
To: Zhijie Hou (Fujitsu) <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Fri, 29 May 2026 22:14:10 +0530
Message-ID: <CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com> (raw)
In-Reply-To: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
References: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>

Hi,

On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <[email protected]>
wrote:

> Hi,
>
> While testing the slot release logic, I noticed a bug in
> ReplicationSlotRelease() where it may access a replication slot array
> entry that
> has already been released by itself.
>
> The detail is: When releasing an ephemeral replication slot,
> ReplicationSlotRelease() first drops the slot via
> ReplicationSlotDropAcquired().
> After this point, the slot's shared memory slot array entry can be
> immediately
> reused by another backend creating a new slot.
>
> However, ReplicationSlotRelease() continued executing common cleanup code
> that
> still dereferenced the old slot pointer and updated shared memory fields
> such as
> effective_xmin. If the slot array entry had already been reallocated, these
> writes could inadvertently affect a different, unrelated slot.
>
> I am attaching a patch that avoids touching slot shared-memory state after
> dropping an ephemeral slot. Keep the post-release shared-memory updates
> only for
> non-ephemeral slots, where the slot remains valid after release.
>
> To reproduce, we can use the following steps:
>
> 1. Attach gdb to the backend and set a breakpoint in
> ReplicationSlotRelease()
>    right after ReplicationSlotDropAcquired() is called.
> 2. Create an ephemeral slot in the above backend with an invalid output
> plugin:
>    SELECT pg_create_logical_replication_slot('test_slot_dropped',
> 'pgoutput2', false, false, true);
> 3. Once the breakpoint is hit, start another backend and create a new slot
>    named 'test_slot_created'.
> 4. Release the breakpoint and allow the first backend to continue. At this
>    point, you will see it updating the new slot 'test_slot_created' ->
> active_proc
>    (and effective_xmin, if a snapshot is being exported) to invalid values.
> 5. Start a third backend and attempt to acquire the same slot
>    'test_slot_created' ? this should not be possible under normal
> circumstances,
>    but the bug allows it.
>

patch LGTM.


>
> I haven't attached a test for this fix, as the change is straightforward
> and the
> likelihood of encountering this bug is low, so it may not be worth adding
> test
> cycles for it. However, if others feel differently, I'm OK to add one.
>

+1 for a test. The fix is just an else, so a future refactor could change
it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot,
nothing
would catch it. Injection points make it deterministic; I've attached a
diff patch that adds
a test that fails without the fix and passes with it.


-- 
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/


Attachments:

  [application/octet-stream] nocfbot-test.patch (4.3K, 3-nocfbot-test.patch)
  download | inline diff:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f012e99c9d7..a91082efb9c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -787,6 +787,14 @@ ReplicationSlotRelease(void)
 		 * decoding be disabled.
 		 */
 		ReplicationSlotDropAcquired(is_logical);
+
+		/*
+		 * The slot's array entry is now free and may be reused by another
+		 * backend at any moment.  This injection point lets tests open that
+		 * window deterministically and verify we never touch the (now stale)
+		 * slot pointer afterwards.
+		 */
+		INJECTION_POINT("ephemeral-slot-release-after-drop", NULL);
 	}
 
 	/*
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 97d11f98b59..fa5a0222dcd 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -275,6 +275,80 @@ is( $node_primary->safe_psql(
 	qq(Check that reset timestamp is later after resetting stats for slot '$stats_test_slot1' again.)
 );
 
+# Regression test for the race in ReplicationSlotRelease() when releasing an
+# ephemeral slot.  Releasing an ephemeral slot first drops it via
+# ReplicationSlotDropAcquired(), which frees the slot's array entry for reuse.
+# ReplicationSlotRelease() must not touch that (now stale) slot pointer
+# afterwards, or it can corrupt an unrelated slot that grabbed the freed entry.
+# This needs an injection point, so it only runs on builds that support them.
+SKIP:
+{
+	skip "Injection points not supported by this build", 2
+	  if $ENV{enable_injection_points} ne 'yes';
+	skip "Extension injection_points not installed", 2
+	  unless $node_primary->check_extension('injection_points');
+
+	$node_primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+	# Freeze a backend right after it drops an ephemeral slot, i.e. once the
+	# slot's array entry has been freed and is available for reuse.
+	$node_primary->safe_psql('postgres',
+		q{SELECT injection_points_attach('ephemeral-slot-release-after-drop', 'wait')}
+	);
+
+	# Create a logical slot with a non-existent output plugin.  The slot is
+	# created as RS_EPHEMERAL; the bogus plugin makes creation ERROR out, and
+	# the error path calls ReplicationSlotRelease(), which drops the ephemeral
+	# slot and then blocks on the injection point.  on_error_stop => 0 keeps the
+	# session alive past the ERROR so it can still report 'done_release'.
+	my $dropper =
+	  $node_primary->background_psql('postgres', on_error_stop => 0);
+	$dropper->query_until(
+		qr/start_drop/, q(
+\echo start_drop
+SELECT pg_create_logical_replication_slot('test_slot_dropped', 'no_such_plugin', false);
+\echo done_release
+));
+
+	# Wait until the backend is parked on the injection point, with the array
+	# entry of 'test_slot_dropped' freed and up for grabs.
+	$node_primary->wait_for_event('client backend',
+		'ephemeral-slot-release-after-drop');
+
+	# A second backend creates a new persistent slot, which reuses the just
+	# freed array entry -- the very entry the frozen backend still points at.
+	$node_primary->safe_psql('postgres',
+		q{SELECT pg_create_physical_replication_slot('test_slot_created', true, false)}
+	);
+
+	my $before = $node_primary->safe_psql('postgres',
+		q{SELECT inactive_since FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+	);
+
+	# Let the frozen backend finish releasing.  With the bug it now scribbles on
+	# the reused entry; with the fix it leaves it untouched.
+	$node_primary->safe_psql('postgres',
+		q{SELECT injection_points_wakeup('ephemeral-slot-release-after-drop')});
+	$dropper->query_until(qr/done_release/, '');
+
+	my $after = $node_primary->safe_psql('postgres',
+		q{SELECT inactive_since FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+	);
+
+	is($after, $before,
+		'releasing an ephemeral slot must not modify a slot that reused its entry'
+	);
+
+	my $slot = $node_primary->safe_psql('postgres',
+		q{SELECT slot_name, slot_type FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+	);
+	is($slot, "test_slot_created|physical", 'reused slot is intact');
+
+	$dropper->quit;
+	$node_primary->safe_psql('postgres',
+		q{SELECT injection_points_detach('ephemeral-slot-release-after-drop')});
+}
+
 # done with the node
 $node_primary->stop;
 


view thread (27+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
  In-Reply-To: <CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox