Re: Fix race in ReplicationSlotRelease for ephemeral slots

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Xuneng Zhou <[email protected]>
To: Amit Kapila <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: Zhijie Hou (Fujitsu) <[email protected]>
Cc: Srinath Reddy Sadipiralla <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
Date: Thu, 18 Jun 2026 16:36:32 +0800
Message-ID: <CABPTF7VxwhYAAFg+71jOg_MQLcJaPfPu40ADC8imyXwr-H967w@mail.gmail.com> (raw)
In-Reply-To: <CAA4eK1KT-FMjXfFdg98qq_04PNnW6T1G-SUNU9dUtn4AC5g86A@mail.gmail.com>
References: <TY4PR01MB177184FF9EE916F577E1F554194082@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAFC+b6o-hD5VxVLZQovmHSYykF8Qzq3eiuBU-U1F_yR9-y6P_w@mail.gmail.com>
	<TY4PR01MB177180A7CE60BCDF286B1C6F594172@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CABPTF7VyH1-W2xnDspECDEzFGQj=WTFpZBCqKfM11OAZa6gQHQ@mail.gmail.com>
	<CAHGQGwE+2WSqiAYgNJRkf_twdB+uRGozjjGhUn76vUKZ8dzbSA@mail.gmail.com>
	<CABPTF7VeA8szPv7LYDVY9_7LftV-HM8NFVQR2natPKmr73JW+A@mail.gmail.com>
	<TY4PR01MB1771887D33612C5A45F7E9CDF941E2@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAA4eK1LqFBKCkX2eoX3iQPxJJnzWTaCpdh9zNotxuoG8BgjdtA@mail.gmail.com>
	<CAA4eK1LkRdbm5XA=qa82Rp_y4rnyJh8pypMWVqOezOZpzy=Oaw@mail.gmail.com>
	<CAHGQGwG_3ff4HciHtTZ_uMvbJgSDWsz4Yawj_zQpDG6Yj=Mjng@mail.gmail.com>
	<CABPTF7WBh_mKi60EYLiueaZ_cdJvnrOrpSt3hQkuZ_uY4w5duA@mail.gmail.com>
	<CAA4eK1LJ9=BJU2oK5aFCfvW=w2muSXNHOPM18wHXHLkRzYxhTQ@mail.gmail.com>
	<CAHGQGwGGyEDL3dh7uJ6qPsGvnq4QK_R8+U=12CaprnzwrwaLGA@mail.gmail.com>
	<CABPTF7UCqndPh8jucFtWBpFMoA2oQkSObQGXVVQNVGMZ1q-DCg@mail.gmail.com>
	<CAA4eK1KT-FMjXfFdg98qq_04PNnW6T1G-SUNU9dUtn4AC5g86A@mail.gmail.com>

On Thu, Jun 18, 2026 at 11:20 AM Amit Kapila <[email protected]> wrote:
>
> On Wed, Jun 17, 2026 at 12:59 PM Xuneng Zhou <[email protected]> wrote:
> >
> > On Tue, Jun 16, 2026 at 8:46 PM Fujii Masao <[email protected]> wrote:
> > >
> > > On Fri, Jun 12, 2026 at 7:54 PM Amit Kapila <[email protected]> wrote:
> > > > I feel even if there is an argument to do such a refactoring, it can
> > > > be done separately. We can push forward with 0001 and then do more
> > > > discussion for 0002, if required. I can take care of 0001 unless
> > > > Fujii-San wishes to take care of it?
> > >
> > > Yeah, please feel free to work on 0001.
> > >
> > > Regarding 0002, since the race is very rare and non-fatal, I'm okay
> > > with accepting the risk rather than adding more refactoring just to
> > > avoid it.
> > >
> > > I'm a bit tempted to add a source comment explaining the risk and
> > > why we accept it, though, so other developers can understand
> > > the tradeoff. For example:
> > >
> > > diff --git a/src/backend/replication/logical/slotsync.c
> > > b/src/backend/replication/logical/slotsync.c
> > > index 05637344363..ca49f20e7d9 100644
> > > --- a/src/backend/replication/logical/slotsync.c
> > > +++ b/src/backend/replication/logical/slotsync.c
> > > @@ -560,6 +560,12 @@ drop_local_obsolete_slots(List *remote_slot_list)
> > >                          * the same shared memory as that of
> > > 'local_slot'. Thus check if
> > >                          * local_slot is still the synced one before
> > > performing the actual
> > >                          * drop.
> > > +                        *
> > > +                        * Because local_slot still points to a
> > > reusable slot-array entry,
> > > +                        * fields such as name or database OID could
> > > already be stale here.
> > > +                        * That could cause an incorrect cleanup
> > > decision for this cycle or
> > > +                        * briefly lock an unrelated database. We
> > > accept that risk because
> > > +                        * this race is rare and non-fatal.
> > >                          */
> > >                         SpinLockAcquire(&local_slot->mutex);
> > >                         synced_slot = local_slot->in_use &&
> > > local_slot->data.synced;
> >
> > Thanks for suggesting the comment! It helps to clarify the situation
> > and the trade-off we made here. I tweaked it a bit and added it to the
> > patches prepared by Zhijie.
> >
>
> + *
> + * We cannot close this window by holding
> + * ReplicationSlotControlLock while taking the database lock,
> + * because the database-drop path holds the database lock and then
> + * scans replication slots.
>
> The database-drop path acquires ReplicationSlotControlLock in shared
> mode, so not sure if the above is completely correct, here you are
> going in the direction of trying to defend that no easy solution
> exists which needs more thought.

I agree that this phrasing could be misleading. The dangerous part is
in the later actual slot drop. Eventually the database-drop path needs
ReplicationSlotControlLock in exclusive mode to set slot->in_use =
false. If the slotsync holds ReplicationSlotControlLock in shared mode
while it tries to take DB AccessShareLock. Then this would lead to a
deadlock. On second thought, I’m inclined not to add a comment
explaining the constraint here, since it would be difficult to do so
concisely and it's not directly related to the issue.

> Fujii-San's proposal was better but
> there also we may need to be a bit more specific about "That could
> cause an incorrect cleanup decision ...", otherwise, it makes the
> comment unclear.

OK, how about elaborate it a bit like this:

/*
 * In the small window between getting the slot to drop and
 * locking the database, there is a possibility of a parallel
 * database drop by the startup process and the creation of a new
 * slot by the user. This new user-created slot may end up using
 * the same shared memory as that of 'local_slot'.
 *
 * If that happens, local_slot now describes the replacement slot:
 * local_sync_slot_required() may have made its drop decision using
 * the replacement slot's name or invalidation state, and slot_database
 * may refer to the replacement slot's database. Thus check if
 * local_slot is still a synced slot before performing the actual drop.
 * This does not prove it is the original slot, but it prevents dropping
 * an ordinary user-created replacement slot, and the copied database OID
 * keeps lock/unlock symmetric. The remaining risk is limited to this
 * cleanup cycle, such as briefly holding an unrelated database lock, and
 * is acceptable here because this race is rare.
 */

> I am planning to commit and backpatch the fix for the first problem
> based on what Hou-San has shared (v2-*), then we can discuss how to
> improve the existing comments and if we agree on something, that can
> be a HEAD-only patch.

Thanks!


--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

view thread (27+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots
  In-Reply-To: <CABPTF7VxwhYAAFg+71jOg_MQLcJaPfPu40ADC8imyXwr-H967w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox