public inbox for [email protected]  
help / color / mirror / Atom feed
From: Amit Kapila <[email protected]>
To: shveta malik <[email protected]>
Cc: Zhijie Hou (Fujitsu) <[email protected]>
Cc: Ashutosh Sharma <[email protected]>
Cc: Ajin Cherian <[email protected]>
Cc: SATYANARAYANA NARLAPURAM <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: Mon, 8 Jun 2026 15:21:44 +0530
Message-ID: <CAA4eK1JLwL_aauy+nrrobC3Sftv369N3N6DaidBpNJdiOwXG+g@mail.gmail.com> (raw)
In-Reply-To: <CAJpy0uCnasi4MSQB=nrjPSv4U_0rb2Z-cg_wazUGQ-P_VnRZeA@mail.gmail.com>
References: <CAJpy0uBT8JbEGE0xvm-Wxh1g_VVgC=whKqChZo-uB+VOa_YCTw@mail.gmail.com>
	<CAE9k0Pkk6q72X3Rc3MUo7PxU46UcCzLfMhM02PGDUmAue9cDGg@mail.gmail.com>
	<TY4PR01MB17718104B91F2945BE727467694102@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAE9k0P=dgCEaKE6+vSCQp8TgrYOi_RqQkDTScdWzFSECsPQn9w@mail.gmail.com>
	<TY4PR01MB17718F364B2CB8108B34FA5FC94112@TY4PR01MB17718.jpnprd01.prod.outlook.com>
	<CAJpy0uCnasi4MSQB=nrjPSv4U_0rb2Z-cg_wazUGQ-P_VnRZeA@mail.gmail.com>

On Fri, Jun 5, 2026 at 9:06 AM shveta malik <[email protected]> wrote:
>
> On Fri, Jun 5, 2026 at 8:34 AM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > On Thursday, June 4, 2026 5:27 PM Ashutosh Sharma <[email protected]> wrote:
> > > On Thu, Jun 4, 2026 at 1:54 PM Zhijie Hou (Fujitsu)
> > > <[email protected]> wrote:
> > > >
> > > > On Thursday, June 4, 2026 3:36 PM Ashutosh Sharma
> > > <[email protected]> wrote:
> > > > > On Thu, Jun 4, 2026 at 9:14 AM shveta malik <[email protected]>
> > > > > wrote:
> > > > > > My preference, and original intent, was to accept duplicate entries
> > > > > > and skip them internally. Doc can be updated to say 'duplicate entries
> > > > > > are skipped'. A server startup failure due to duplicate entries in a
> > > > > > GUC does not seem right to me. If the alter-system command fails due
> > > > > > to duplicate entries, that is still fine, but a startup failure seems
> > > > > > excessive. But let's see what others have to say on this.
> > > > > >
> > > > >
> > > > > Okay, the attached patch adds the capability to automatically remove
> > > > > duplicate entries from the synchronized_standby_slots list.
> > > >
> > > > Thanks for updating the patch.
> > > >
> > > > I agree with Shveta that reporting an ERROR is not ideal. I also think it (ERROR) would
> > > > be inconsistent with existing GUCs, as most of them, such as
> > > > synchronous_standby_names, search_path, and session_preload_libraries, do not
> > > > enforce uniqueness.
> > > >
> > > > The most similar GUC, synchronous_standby_names, also clarifies this in the
> > > > documentation:
> > > >
> > > >         " There is no mechanism to enforce uniqueness of standby names. In case of
> > > >         duplicates one of the matching standbys will be considered as higher priority,
> > > >         though exactly which one is indeterminate."[1]
> > > >
> > > > > In N of M
> > > > > mode, if N > M after removing duplicate entries, an error is raised.
> > > >
> > > > I'm not entirely sure about this case. It seems similar to when the number of
> > > > specified slots is less than N (in ANY N or FIRST N), given that we want to
> > > skip
> > > > duplicate slots. In that situation, the natural behavior to me would be to
> > > > simply block replication rather than raise an error. And
> > > > synchronous_standby_names would also simply block the transaction in this
> > > case.
> > > >
> > >
> > > For duplicate entries themselves, I agree with the direction of not
> > > raising an error. Silently normalizing duplicates is reasonable for
> > > this GUC, especially if we document it clearly. A repeated slot name
> > > does not add any new information, so treating it as “same slot listed
> > > twice by mistake” is practical.
> > >
> > > But for N > M after deduplication, I would still lean toward raising an error.
> > >
> > > Why I’d separate those cases:
> > >
> > > 1) Duplicate entries looks like a harmless normalization problem. ANY
> > > 2 (a, a, b) can be normalized to ANY 2 (a, b) without changing the
> > > user’s apparent intent much.
> > >
> > > 2) N > M after deduplication is not a transient runtime state. ANY 2
> > > (a, a) becomes one unique slot. That configuration can never succeed
> > > unless the config itself changes. Blocking forever turns a static
> > > configuration mistake into an operational liveness problem.
> > >
> > > 3) N > M after deduplication is different from ordinary “not enough
> > > standbys are currently available”. If we configure ANY 2 (a, b) and
> > > only a is currently caught up, blocking makes sense because the
> > > situation may resolve at runtime. If we configure ANY 2 (a, a) and
> > > duplicates are ignored, there is no possible future runtime in which
> > > it succeeds without editing the GUC. That is why I think erroring is
> > > better.
> > >
> > > On the synchronous_standby_names comparison, I do not think it is
> > > fully analogous. The quoted documentation is about there being no
> > > reliable way to enforce uniqueness of standby names in the live
> > > system, because those names are matched against runtime standbys and
> > > the result can be indeterminate. Here, synchronized_standby_slots
> > > names concrete replication slots, which are stable object identifiers.
> > > Duplicate config entries are detectable and normalizable
> > > deterministically at GUC parse time. That gives us a cleaner option
> > > than synchronous_standby_names has.
> >
> > Thanks for the explanation.
> >
> > What I was wondering is: ignoring duplicates, what should be the behavior of
> > "ANY 2 (standby)" when N > M?
> >
> > I studied a bit for the behavior of synchronous_standby_names to understand the
> > difference. synchronous_standby_names does support syntax like "ANY 2 (standby)"
> > where N > M. Because even in that case, a transaction can still commit if there
> > are two standbys with the same name ("standby" in this example). I'm not sure
> > how common that use case is, but it may explain why no error is reported.
> >
> > Given that, I'm not opposed to reporting an error in synchronized_standby_slots
> > when N > M. The situation is different here since there cannot be two slots with
> > the same name, making this a completely invalid use case.
> >
>
> I also think, we can report error when N>M.
>

+1 for reporting an ERROR for this case.

-- 
With Regards,
Amit Kapila.






view thread (25+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  In-Reply-To: <CAA4eK1JLwL_aauy+nrrobC3Sftv369N3N6DaidBpNJdiOwXG+g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox