public inbox for [email protected]
help / color / mirror / Atom feedFrom: Zhijie Hou (Fujitsu) <[email protected]>
To: Amit Kapila <[email protected]>
Cc: Xuneng Zhou <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: RE: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Tue, 16 Jun 2026 10:32:14 +0000
Message-ID: <TY4PR01MB177185B82B7BD0A20028E4CED94E52@TY4PR01MB17718.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <CAA4eK1K7e1Y2iYkmRZ5CCh0pZOTMUShKDj0nP4nY3Wdcypt7oQ@mail.gmail.com>
References: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
<CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com>
<TY4PR01MB177180A7CE60BCDF286B1C6F594172@TY4PR01MB17718.jpnprd01.prod.outlook.com>
<CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com>
<CAHGQGwE+2WSqiAYgNJRkf_twdB+uRGozjjGhUn76vUKZ8dzbSA@mail.gmail.com>
<CABPTF7VeA8szPv7LYDVY9_7LftV-HM8NFVQR2natPKmr73JW+A@mail.gmail.com>
<TY4PR01MB1771887D33612C5A45F7E9CDF941E2@TY4PR01MB17718.jpnprd01.prod.outlook.com>
<CAA4eK1LqFBKCkX2eoX3iQPxJJnzWTaCpdh9zNotxuoG8BgjdtA@mail.gmail.com>
<CAA4eK1LkRdbm5XA=qa82Rp_y4rnyJh8pypMWVqOezOZpzy=Oaw@mail.gmail.com>
<CAHGQGwG_3ff4HciHtTZ_uMvbJgSDWsz4Yawj_zQpDG6Yj=Mjng@mail.gmail.com>
<CABPTF7WBh_mKi60EYLiueaZ_cdJvnrOrpSt3hQkuZ_uY4w5duA@mail.gmail.com>
<CAA4eK1LJ9=BJU2oK5aFCfvW=w2muSXNHOPM18wHXHLkRzYxhTQ@mail.gmail.com>
<CABPTF7VdFwiROsch4T7VbOCqQYpRbh==gAZPM6tJeff5Ou80Qw@mail.gmail.com>
<TY4PR01MB17718F4D0C5C8EB96A303C2E594E52@TY4PR01MB17718.jpnprd01.prod.outlook.com>
<CAA4eK1K7e1Y2iYkmRZ5CCh0pZOTMUShKDj0nP4nY3Wdcypt7oQ@mail.gmail.com>
On Tuesday, June 16, 2026 5:36 PM Amit Kapila <[email protected]> wrote:
>
> On Tue, Jun 16, 2026 at 2:24 PM Zhijie Hou (Fujitsu) <[email protected]>
> wrote:
> >
> >
> > I have one minor comment for the 0001 patch.
> >
> > + NameData slot_name = {0};
> > ...
> > SpinLockAcquire(&local_slot->mutex);
> > synced_slot = local_slot->in_use &&
> > local_slot->data.synced;
> > + if (synced_slot)
> > + slot_name = local_slot->data.name;
> > SpinLockRelease(&local_slot->mutex);
> >
> > We can defer assigning slot_name until after we pass the existing
> > (synced_slot) check. Since it's a synced slot, no other process can
> > change it at that point, and we can also skip initializing slot_name.
> > (Please refer to the attached patch for suggested changes)
> >
>
> + if (dropped)
> + ereport(LOG,
> + errmsg("dropped replication slot \"%s\" of database with OID %u",
> + NameStr(slot_name),
> + slot_database));
>
> Can we avoid the if (dropped) check by placing this LOG message immediately
> after dropping the slot under synced slot check?
I think we can do that. I'm attaching the new patches for all supported
branches, incorporating both my and Amit's comments. I hope this helps move the
fix forward.
I also confirmed that the fix works on all supported branches.
Best Regards,
Hou zj
Attachments:
[application/octet-stream] v2_PG17-0001-Avoid-stale-slot-access-after-dropping-obsol.patch (2.7K, 2-v2_PG17-0001-Avoid-stale-slot-access-after-dropping-obsol.patch)
download | inline diff:
From 41350e61f9d0a5060a7a76fc1de91ba944f4328c Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Tue, 16 Jun 2026 18:15:35 +0800
Subject: [PATCH v2_PG18] Avoid stale slot access after dropping obsolete
synced slots
drop_local_obsolete_slots() kept using local_slot after calling
ReplicationSlotDropAcquired(). Once the drop completes, the slot array entry can
be reused by another backend, so later reads of local_slot->data could refer to a
different slot.
Copy the slot name and database OID before dropping the slot, and use those
saved values for unlocking and logging after the drop.
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Zhijie Hou <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
---
src/backend/replication/logical/slotsync.c | 23 ++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index bc42d74fec2..c4dda8aa5f1 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -463,6 +463,7 @@ drop_local_obsolete_slots(List *remote_slot_list)
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ Oid slot_database = local_slot->data.database;
bool synced_slot;
/*
@@ -470,8 +471,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
* ReplicationSlotsDropDBSlots(), trying to drop the same slot
* during a drop-database operation.
*/
- LockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ LockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
/*
* In the small window between getting the slot to drop and
@@ -488,17 +489,19 @@ drop_local_obsolete_slots(List *remote_slot_list)
if (synced_slot)
{
- ReplicationSlotAcquire(NameStr(local_slot->data.name), true);
+ NameData slot_name = local_slot->data.name;
+
+ ReplicationSlotAcquire(NameStr(slot_name), true);
ReplicationSlotDropAcquired();
- }
- UnlockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ ereport(LOG,
+ errmsg("dropped replication slot \"%s\" of database with OID %u",
+ NameStr(slot_name),
+ slot_database));
+ }
- ereport(LOG,
- errmsg("dropped replication slot \"%s\" of database with OID %u",
- NameStr(local_slot->data.name),
- local_slot->data.database));
+ UnlockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
}
}
}
--
2.43.0
[application/octet-stream] v2-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch (2.9K, 3-v2-0001-Avoid-stale-slot-access-after-dropping-obsolete-s.patch)
download | inline diff:
From 898dfaeab0a721b8d2902e6d7b4b799193b18aee Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 2 Jun 2026 13:14:54 +0800
Subject: [PATCH v2] Avoid stale slot access after dropping obsolete synced slots
drop_local_obsolete_slots() kept using local_slot after calling
ReplicationSlotDropAcquired(). Once the drop completes, the slot array entry can
be reused by another backend, so later reads of local_slot->data could refer to a
different slot.
Copy the slot name and database OID before dropping the slot, and use those
saved values for unlocking and logging after the drop.
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Zhijie Hou <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
---
src/backend/replication/logical/slotsync.c | 23 ++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 96107c9475d..05637344363 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -541,6 +541,7 @@ drop_local_obsolete_slots(List *remote_slot_list)
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ Oid slot_database = local_slot->data.database;
bool synced_slot;
/*
@@ -548,8 +549,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
* ReplicationSlotsDropDBSlots(), trying to drop the same slot
* during a drop-database operation.
*/
- LockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ LockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
/*
* In the small window between getting the slot to drop and
@@ -566,23 +567,25 @@ drop_local_obsolete_slots(List *remote_slot_list)
if (synced_slot)
{
+ NameData slot_name = local_slot->data.name;
+
/*
* Now acquire and drop the slot. Note we purposely don't
* request logical decoding to be disabled here: since this is
* a standby, which derives its logical decoding state from
* the primary, it would be wrong to do so.
*/
- ReplicationSlotAcquire(NameStr(local_slot->data.name), true, false);
+ ReplicationSlotAcquire(NameStr(slot_name), true, false);
ReplicationSlotDropAcquired(false);
- }
- UnlockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ ereport(LOG,
+ errmsg("dropped replication slot \"%s\" of database with OID %u",
+ NameStr(slot_name),
+ slot_database));
+ }
- ereport(LOG,
- errmsg("dropped replication slot \"%s\" of database with OID %u",
- NameStr(local_slot->data.name),
- local_slot->data.database));
+ UnlockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
}
}
}
--
2.43.0
[application/octet-stream] v2_PG18-0001-Avoid-stale-slot-access-after-dropping-obsol.patch (2.7K, 4-v2_PG18-0001-Avoid-stale-slot-access-after-dropping-obsol.patch)
download | inline diff:
From 41350e61f9d0a5060a7a76fc1de91ba944f4328c Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Tue, 16 Jun 2026 18:15:35 +0800
Subject: [PATCH v2_PG18] Avoid stale slot access after dropping obsolete
synced slots
drop_local_obsolete_slots() kept using local_slot after calling
ReplicationSlotDropAcquired(). Once the drop completes, the slot array entry can
be reused by another backend, so later reads of local_slot->data could refer to a
different slot.
Copy the slot name and database OID before dropping the slot, and use those
saved values for unlocking and logging after the drop.
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Zhijie Hou <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
---
src/backend/replication/logical/slotsync.c | 23 ++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index bc42d74fec2..c4dda8aa5f1 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -463,6 +463,7 @@ drop_local_obsolete_slots(List *remote_slot_list)
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list))
{
+ Oid slot_database = local_slot->data.database;
bool synced_slot;
/*
@@ -470,8 +471,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
* ReplicationSlotsDropDBSlots(), trying to drop the same slot
* during a drop-database operation.
*/
- LockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ LockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
/*
* In the small window between getting the slot to drop and
@@ -488,17 +489,19 @@ drop_local_obsolete_slots(List *remote_slot_list)
if (synced_slot)
{
- ReplicationSlotAcquire(NameStr(local_slot->data.name), true, false);
+ NameData slot_name = local_slot->data.name;
+
+ ReplicationSlotAcquire(NameStr(slot_name), true, false);
ReplicationSlotDropAcquired();
- }
- UnlockSharedObject(DatabaseRelationId, local_slot->data.database,
- 0, AccessShareLock);
+ ereport(LOG,
+ errmsg("dropped replication slot \"%s\" of database with OID %u",
+ NameStr(slot_name),
+ slot_database));
+ }
- ereport(LOG,
- errmsg("dropped replication slot \"%s\" of database with OID %u",
- NameStr(local_slot->data.name),
- local_slot->data.database));
+ UnlockSharedObject(DatabaseRelationId, slot_database, 0,
+ AccessShareLock);
}
}
}
--
2.43.0
view thread (27+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: RE: Fix race in ReplicationSlotRelease for ephemeral slots
In-Reply-To: <TY4PR01MB177185B82B7BD0A20028E4CED94E52@TY4PR01MB17718.jpnprd01.prod.outlook.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox