public inbox for [email protected]
help / color / mirror / Atom feedFrom: Srinath Reddy Sadipiralla <[email protected]>
To: Zhijie Hou (Fujitsu) <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Fri, 29 May 2026 22:14:10 +0530
Message-ID: <CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com> (raw)
In-Reply-To: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
References: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
Hi,
On Wed, May 27, 2026 at 5:20 PM Zhijie Hou (Fujitsu) <[email protected]>
wrote:
> Hi,
>
> While testing the slot release logic, I noticed a bug in
> ReplicationSlotRelease() where it may access a replication slot array
> entry that
> has already been released by itself.
>
> The detail is: When releasing an ephemeral replication slot,
> ReplicationSlotRelease() first drops the slot via
> ReplicationSlotDropAcquired().
> After this point, the slot's shared memory slot array entry can be
> immediately
> reused by another backend creating a new slot.
>
> However, ReplicationSlotRelease() continued executing common cleanup code
> that
> still dereferenced the old slot pointer and updated shared memory fields
> such as
> effective_xmin. If the slot array entry had already been reallocated, these
> writes could inadvertently affect a different, unrelated slot.
>
> I am attaching a patch that avoids touching slot shared-memory state after
> dropping an ephemeral slot. Keep the post-release shared-memory updates
> only for
> non-ephemeral slots, where the slot remains valid after release.
>
> To reproduce, we can use the following steps:
>
> 1. Attach gdb to the backend and set a breakpoint in
> ReplicationSlotRelease()
> right after ReplicationSlotDropAcquired() is called.
> 2. Create an ephemeral slot in the above backend with an invalid output
> plugin:
> SELECT pg_create_logical_replication_slot('test_slot_dropped',
> 'pgoutput2', false, false, true);
> 3. Once the breakpoint is hit, start another backend and create a new slot
> named 'test_slot_created'.
> 4. Release the breakpoint and allow the first backend to continue. At this
> point, you will see it updating the new slot 'test_slot_created' ->
> active_proc
> (and effective_xmin, if a snapshot is being exported) to invalid values.
> 5. Start a third backend and attempt to acquire the same slot
> 'test_slot_created' ? this should not be possible under normal
> circumstances,
> but the bug allows it.
>
patch LGTM.
>
> I haven't attached a test for this fix, as the change is straightforward
> and the
> likelihood of encountering this bug is low, so it may not be worth adding
> test
> cycles for it. However, if others feel differently, I'm OK to add one.
>
+1 for a test. The fix is just an else, so a future refactor could change
it and silently
reintroduce the corruption, since it scribbles on an unrelated reused slot,
nothing
would catch it. Injection points make it deterministic; I've attached a
diff patch that adds
a test that fails without the fix and passes with it.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachments:
[application/octet-stream] nocfbot-test.patch (4.3K, 3-nocfbot-test.patch)
download | inline diff:
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index f012e99c9d7..a91082efb9c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -787,6 +787,14 @@ ReplicationSlotRelease(void)
* decoding be disabled.
*/
ReplicationSlotDropAcquired(is_logical);
+
+ /*
+ * The slot's array entry is now free and may be reused by another
+ * backend at any moment. This injection point lets tests open that
+ * window deterministically and verify we never touch the (now stale)
+ * slot pointer afterwards.
+ */
+ INJECTION_POINT("ephemeral-slot-release-after-drop", NULL);
}
/*
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 97d11f98b59..fa5a0222dcd 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -275,6 +275,80 @@ is( $node_primary->safe_psql(
qq(Check that reset timestamp is later after resetting stats for slot '$stats_test_slot1' again.)
);
+# Regression test for the race in ReplicationSlotRelease() when releasing an
+# ephemeral slot. Releasing an ephemeral slot first drops it via
+# ReplicationSlotDropAcquired(), which frees the slot's array entry for reuse.
+# ReplicationSlotRelease() must not touch that (now stale) slot pointer
+# afterwards, or it can corrupt an unrelated slot that grabbed the freed entry.
+# This needs an injection point, so it only runs on builds that support them.
+SKIP:
+{
+ skip "Injection points not supported by this build", 2
+ if $ENV{enable_injection_points} ne 'yes';
+ skip "Extension injection_points not installed", 2
+ unless $node_primary->check_extension('injection_points');
+
+ $node_primary->safe_psql('postgres', q(CREATE EXTENSION injection_points));
+
+ # Freeze a backend right after it drops an ephemeral slot, i.e. once the
+ # slot's array entry has been freed and is available for reuse.
+ $node_primary->safe_psql('postgres',
+ q{SELECT injection_points_attach('ephemeral-slot-release-after-drop', 'wait')}
+ );
+
+ # Create a logical slot with a non-existent output plugin. The slot is
+ # created as RS_EPHEMERAL; the bogus plugin makes creation ERROR out, and
+ # the error path calls ReplicationSlotRelease(), which drops the ephemeral
+ # slot and then blocks on the injection point. on_error_stop => 0 keeps the
+ # session alive past the ERROR so it can still report 'done_release'.
+ my $dropper =
+ $node_primary->background_psql('postgres', on_error_stop => 0);
+ $dropper->query_until(
+ qr/start_drop/, q(
+\echo start_drop
+SELECT pg_create_logical_replication_slot('test_slot_dropped', 'no_such_plugin', false);
+\echo done_release
+));
+
+ # Wait until the backend is parked on the injection point, with the array
+ # entry of 'test_slot_dropped' freed and up for grabs.
+ $node_primary->wait_for_event('client backend',
+ 'ephemeral-slot-release-after-drop');
+
+ # A second backend creates a new persistent slot, which reuses the just
+ # freed array entry -- the very entry the frozen backend still points at.
+ $node_primary->safe_psql('postgres',
+ q{SELECT pg_create_physical_replication_slot('test_slot_created', true, false)}
+ );
+
+ my $before = $node_primary->safe_psql('postgres',
+ q{SELECT inactive_since FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+ );
+
+ # Let the frozen backend finish releasing. With the bug it now scribbles on
+ # the reused entry; with the fix it leaves it untouched.
+ $node_primary->safe_psql('postgres',
+ q{SELECT injection_points_wakeup('ephemeral-slot-release-after-drop')});
+ $dropper->query_until(qr/done_release/, '');
+
+ my $after = $node_primary->safe_psql('postgres',
+ q{SELECT inactive_since FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+ );
+
+ is($after, $before,
+ 'releasing an ephemeral slot must not modify a slot that reused its entry'
+ );
+
+ my $slot = $node_primary->safe_psql('postgres',
+ q{SELECT slot_name, slot_type FROM pg_replication_slots WHERE slot_name = 'test_slot_created'}
+ );
+ is($slot, "test_slot_created|physical", 'reused slot is intact');
+
+ $dropper->quit;
+ $node_primary->safe_psql('postgres',
+ q{SELECT injection_points_detach('ephemeral-slot-release-after-drop')});
+}
+
# done with the node
$node_primary->stop;
view thread (27+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
In-Reply-To: <CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox