public inbox for [email protected]  
help / color / mirror / Atom feed
Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
11+ messages / 5 participants
[nested] [flat]

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
@ 2026-02-26 04:58 Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: Ashutosh Sharma @ 2026-02-26 04:58 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi,


On Wed, Feb 25, 2026 at 7:21 PM Ashutosh Sharma <[email protected]> wrote:
>
> Hi Satya,
>
> On Wed, Feb 25, 2026 at 3:38 AM SATYANARAYANA NARLAPURAM
> <[email protected]> wrote:
> >
> >
> > Hi hackers,
> >
> > synchronized_standby_slots requires that every physical slot listed in the GUC has caught up before a logical failover slot is allowed to proceed with decoding. This is an ALL-of-N slots  semantic.  The logical slot availability model does not align with quorum replication semantics set using synchronous_standby_names which can be configured for quorum commit (ANY M of N).
> >
> > In a typical 3 Node HA deployment with quorum sync rep:
> >
> > Primary, standby1 (corresponds to sb1_slot), standby2 (corresponds to sb2_slot)
> > synchronized_standby_slots = ' sb1_slot,  sb2_slot'
> > synchronous_standby_names = 'Any 1 ('standby1','standby2')'
> >
> > If standby1 goes down, synchronous commits still succeed because standby2 satisfies the quorum. However, logical decoding blocks indefinitely in WaitForStandbyConfirmation(), waiting for sb1_slot (corresponds to standby1) to catch up — even though the transaction is already safely committed on a quorum of synchronous standbys. This blocks logical decoding consumers from progressing and is inconsistent with the availability guarantee the DBA intended by choosing quorum commit.
>
> +1. This can indeed be a blocker for failover enabled logical
> replication. It not only has the potential to disrupt logical
> replication, but can also impact the primary server. Over time, it may
> silently lead to significant WAL accumulation on the primary,
> eventually causing disk-full scenarios and degrading the performance
> of applications running on the primary instance. Therefore, I too
> strongly believe this needs to be addressed to prevent such
> potentially disruptive situations.
>
> >
> >
> > Proposal:
> >
> > Make synchronized_standby_slots quorum aware i.e. extend the GUC to accept an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_names, so StandbySlotsHaveCaughtup() can return true when M of N slots (where M <= N and M >= 1) have caught up. I still prefer two different GUCs for this as the list of slots to be synchronized can still be different (for example, DBA may want to ensure Geo standby to be sync before allowing the logical decoding client to read the changes). I kept synchronized_standby_slots  parse logic similar to  synchronous_standby_names  to keep things simple. The default behavior is also not changed for  synchronized_standby_slots.
> >
>
> Thank you for the proposal. I can spend some time reviewing the
> changes and help take this forward. I would also be happy to hear
> others' thoughts and feedback on the proposal.
>

Thinking about this further, using quorum settings for
synchronized_standby_slots can/will certainly result in at least one
sync standby lagging behind the logical replica, making it probably
impossible to continue with the existing logical replication setup
after a failover to the standby that lags behind. Here is what I am
mean:

Let's say we have 2 synchronous standbys with
"synchronized_standby_slots" configured as ANY 1 (sync_standby1,
sync_standby2). With this quorum setting, WAL only needs to be
confirmed by any one of the two standbys before it can be forwarded to
the logical replica. Now consider a scenario where sync_standby1 is
ahead of sync_standby2, new WAL gets confirmed by sync_standby1 and
subsequently delivered to the logical replica. If sync_standby1 then
goes down and we failover to sync_standby2, the new primary will be at
a lower LSN than the logical replica, since sync_standby2 never
received that WAL. At this point, the logical replication slot on the
new primary is essentially stale, and the logical replication setup
that existed before the failover cannot be resumed. Hence, I think
it's important to ensure that the WAL (including all the necessary
data needed for logical replication) gets delivered to all the
servers/slots specified in synchronized_standby_slots before it gets
delivered to the logical replica.

While I agree that not allowing quorum like settings for this has the
potential to accumulate WAL and impact logical replication, I think we
can explore other ways to mitigate that concern separately.

Let's see what experts have to say on this.

--
With Regards,
Ashutosh Sharma.






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
@ 2026-02-26 06:19 ` Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:02   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: Amit Kapila @ 2026-02-26 06:19 UTC (permalink / raw)
  To: Ashutosh Sharma <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

On Thu, Feb 26, 2026 at 10:28 AM Ashutosh Sharma <[email protected]> wrote:
>
>
> > >
> > > Proposal:
> > >
> > > Make synchronized_standby_slots quorum aware i.e. extend the GUC to accept an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_names, so StandbySlotsHaveCaughtup() can return true when M of N slots (where M <= N and M >= 1) have caught up. I still prefer two different GUCs for this as the list of slots to be synchronized can still be different (for example, DBA may want to ensure Geo standby to be sync before allowing the logical decoding client to read the changes). I kept synchronized_standby_slots  parse logic similar to  synchronous_standby_names  to keep things simple. The default behavior is also not changed for  synchronized_standby_slots.
> > >
...
>
> Thinking about this further, using quorum settings for
> synchronized_standby_slots can/will certainly result in at least one
> sync standby lagging behind the logical replica, making it probably
> impossible to continue with the existing logical replication setup
> after a failover to the standby that lags behind. Here is what I am
> mean:
>

But won't that be true even for synchronous_standby_names? I think in
the case of quorum, it is the responsibility of the failover solution
to select the most recent synced standby among all the standby's
specified in synchronous_standby_names. Similarly here before failing
over logical subscriber to one of physical standby, the failover tool
needs to ensure it is switching over to the synced replica. We have
given steps in the docs [1] that could be used to identify the replica
where the subscriber can switchover. Will that address your concern?

BTW, I have also suggested this idea in thread [2]. I don't recall all
the ideas/points discussed in that thread but it would be good to
check that thread for any alternative ideas and points raised, so that
we don't miss anything.

[1] - https://www.postgresql.org/docs/current/logical-replication-failover.html
[2] - https://www.postgresql.org/message-id/CAA4eK1KLFdmj8CLrZNL0D4phqyQihb7NXOjmqvrU5DT8moQn9Q%40mail.gma...

-- 
With Regards,
Amit Kapila.






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
@ 2026-02-26 07:42   ` Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Ashutosh Sharma @ 2026-02-26 07:42 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi Amit,

On Thu, Feb 26, 2026 at 11:50 AM Amit Kapila <[email protected]> wrote:
>
> On Thu, Feb 26, 2026 at 10:28 AM Ashutosh Sharma <[email protected]> wrote:
> >
> >
> > > >
> > > > Proposal:
> > > >
> > > > Make synchronized_standby_slots quorum aware i.e. extend the GUC to accept an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_names, so StandbySlotsHaveCaughtup() can return true when M of N slots (where M <= N and M >= 1) have caught up. I still prefer two different GUCs for this as the list of slots to be synchronized can still be different (for example, DBA may want to ensure Geo standby to be sync before allowing the logical decoding client to read the changes). I kept synchronized_standby_slots  parse logic similar to  synchronous_standby_names  to keep things simple. The default behavior is also not changed for  synchronized_standby_slots.
> > > >
> ...
> >
> > Thinking about this further, using quorum settings for
> > synchronized_standby_slots can/will certainly result in at least one
> > sync standby lagging behind the logical replica, making it probably
> > impossible to continue with the existing logical replication setup
> > after a failover to the standby that lags behind. Here is what I am
> > mean:
> >
>
> But won't that be true even for synchronous_standby_names? I think in
> the case of quorum, it is the responsibility of the failover solution
> to select the most recent synced standby among all the standby's
> specified in synchronous_standby_names. Similarly here before failing
> over logical subscriber to one of physical standby, the failover tool
> needs to ensure it is switching over to the synced replica. We have
> given steps in the docs [1] that could be used to identify the replica
> where the subscriber can switchover. Will that address your concern?
>

Here's my understanding of this:

I don't think we should be comparing "synchronous_standby_names" with
"synchronized_standby_slots", even though they appear similar in
purpose. All values listed in synchronous_standby_names represent
synchronous standbys exclusively, whereas synchronized_standby_slots
can hold values for both synchronous and asynchronous standbys. In
other words, every server referenced by synchronous_standby_names is
of the same type, but that may not be the case with
synchronized_standby_slots.

If a GUC can hold values of different types (sync vs. async), does it
really make sense to use a qualifier like ANY 1 (val1, val2) when val1
and val2 are different in nature? For example, suppose val1 is a
synchronous standby and val2 is an asynchronous standby, and we
configure ANY 1 (val1, val2). It's possible for val2 to get ahead of
val1 in terms of replication progress, which in turn could mean the
logical replica is also ahead of val1. So if we were to fail over to
val1 (since it's the only synchronous standby), we will not be able to
use the existing logical replication setup.

Please correct me if I have misunderstood anything here.

--
With Regards,
Ashutosh Sharma.






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
@ 2026-02-26 08:23     ` SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-02-26 08:23 UTC (permalink / raw)
  To: Ashutosh Sharma <[email protected]>; +Cc: Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi Ashutosh,

On Wed, Feb 25, 2026 at 11:42 PM Ashutosh Sharma <[email protected]>
wrote:

>
> I don't think we should be comparing "synchronous_standby_names" with
> "synchronized_standby_slots", even though they appear similar in
> purpose. All values listed in synchronous_standby_names represent
> synchronous standbys exclusively, whereas synchronized_standby_slots
> can hold values for both synchronous and asynchronous standbys. In
> other words, every server referenced by synchronous_standby_names is
> of the same type, but that may not be the case with
> synchronized_standby_slots.
>
> If a GUC can hold values of different types (sync vs. async), does it
> really make sense to use a qualifier like ANY 1 (val1, val2) when val1
> and val2 are different in nature? For example, suppose val1 is a
> synchronous standby and val2 is an asynchronous standby, and we
> configure ANY 1 (val1, val2). It's possible for val2 to get ahead of
> val1 in terms of replication progress, which in turn could mean the
> logical replica is also ahead of val1. So if we were to fail over to
> val1 (since it's the only synchronous standby), we will not be able to
> use the existing logical replication setup.
>

If the failover orchestrator cannot ensure standby1 to not get the quorum
committed WAL (from archive or standby2) then the setting ANY 1 (val1,
val2) is invalid.
This setup also has issues because in your scenario, standby2 is ahead of
the new primary (standby1) and standby2 requires now to rewind to be in
sync with the new primary. Additionally, it allowed readers to read data
that was lost at the end of the failover. We ideally need a mechanism to
not send WAL to async replicas before the sync replicas commit  (honoring
syncrhnous_standby_names GUC) feature (similar to
synchronized_standby_slots). It could be a different thread on its own.


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-02-26 08:45       ` shveta malik <[email protected]>
  2026-02-26 09:11         ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 09:29         ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Alexander Kukushkin <[email protected]>
  0 siblings, 2 replies; 11+ messages in thread

From: shveta malik @ 2026-02-26 08:45 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: Ashutosh Sharma <[email protected]>; Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

On Thu, Feb 26, 2026 at 1:54 PM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
> Hi Ashutosh,
>
> On Wed, Feb 25, 2026 at 11:42 PM Ashutosh Sharma <[email protected]> wrote:
>>
>>
>> I don't think we should be comparing "synchronous_standby_names" with
>> "synchronized_standby_slots", even though they appear similar in
>> purpose. All values listed in synchronous_standby_names represent
>> synchronous standbys exclusively, whereas synchronized_standby_slots
>> can hold values for both synchronous and asynchronous standbys. In
>> other words, every server referenced by synchronous_standby_names is
>> of the same type, but that may not be the case with
>> synchronized_standby_slots.
>>
>> If a GUC can hold values of different types (sync vs. async), does it
>> really make sense to use a qualifier like ANY 1 (val1, val2) when val1
>> and val2 are different in nature? For example, suppose val1 is a
>> synchronous standby and val2 is an asynchronous standby, and we
>> configure ANY 1 (val1, val2). It's possible for val2 to get ahead of
>> val1 in terms of replication progress, which in turn could mean the
>> logical replica is also ahead of val1. So if we were to fail over to
>> val1 (since it's the only synchronous standby), we will not be able to
>> use the existing logical replication setup.
>
>
> If the failover orchestrator cannot ensure standby1 to not get the quorum committed WAL (from archive or standby2) then the setting ANY 1 (val1, val2) is invalid.
> This setup also has issues because in your scenario, standby2 is ahead of the new primary (standby1) and standby2 requires now to rewind to be in sync with the new primary. Additionally, it allowed readers to read data that was lost at the end of the failover. We ideally need a mechanism to not send WAL to async replicas before the sync replicas commit  (honoring syncrhnous_standby_names GUC) feature (similar to synchronized_standby_slots). It could be a different thread on its own.


+1 on the overall idea of the patch.
I understand the concern raised above that one of the standbys in the
quorum (synchronized_standby_slots) might lag behind the logical
replica, and a user could potentially failover to such a standby. But
I also agree with Amit that configuring failover correctly is
ultimately the responsibility of failover-solution. And instructions
in doc should be followed before deciding if a standby is
failover-ready or not.

As suggested in [1], IMO, it is a reasonably good idea for
'synchronized_standby_slots' to DEFAULT to the value of
'synchronous_standby_names'. That way, even if the user missed to
configure 'synchronized_standby_slots' explicitly, we would still have
reasonable protection in place. At the same time, if a user
intentionally chooses not to configure it, a NULL/NONE value should
remain a valid option.

[1]: https://www.postgresql.org/message-id/CAJpy0uCZ04ZQFHs-tV5LprkYtSSwtBtUJW4O%3D0S01yc%2BTRw7EQ%40mail...

Thanks,
Shveta






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
@ 2026-02-26 09:11         ` Ashutosh Sharma <[email protected]>
  2026-02-26 10:46           ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Ashutosh Sharma @ 2026-02-26 09:11 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi,

On Thu, Feb 26, 2026 at 2:15 PM shveta malik <[email protected]> wrote:
>
> On Thu, Feb 26, 2026 at 1:54 PM SATYANARAYANA NARLAPURAM
> <[email protected]> wrote:
> >
> > Hi Ashutosh,
> >
> > On Wed, Feb 25, 2026 at 11:42 PM Ashutosh Sharma <[email protected]> wrote:
> >>
> >>
> >> I don't think we should be comparing "synchronous_standby_names" with
> >> "synchronized_standby_slots", even though they appear similar in
> >> purpose. All values listed in synchronous_standby_names represent
> >> synchronous standbys exclusively, whereas synchronized_standby_slots
> >> can hold values for both synchronous and asynchronous standbys. In
> >> other words, every server referenced by synchronous_standby_names is
> >> of the same type, but that may not be the case with
> >> synchronized_standby_slots.
> >>
> >> If a GUC can hold values of different types (sync vs. async), does it
> >> really make sense to use a qualifier like ANY 1 (val1, val2) when val1
> >> and val2 are different in nature? For example, suppose val1 is a
> >> synchronous standby and val2 is an asynchronous standby, and we
> >> configure ANY 1 (val1, val2). It's possible for val2 to get ahead of
> >> val1 in terms of replication progress, which in turn could mean the
> >> logical replica is also ahead of val1. So if we were to fail over to
> >> val1 (since it's the only synchronous standby), we will not be able to
> >> use the existing logical replication setup.
> >
> >
> > If the failover orchestrator cannot ensure standby1 to not get the quorum committed WAL (from archive or standby2) then the setting ANY 1 (val1, val2) is invalid.
> > This setup also has issues because in your scenario, standby2 is ahead of the new primary (standby1) and standby2 requires now to rewind to be in sync with the new primary. Additionally, it allowed readers to read data that was lost at the end of the failover. We ideally need a mechanism to not send WAL to async replicas before the sync replicas commit  (honoring syncrhnous_standby_names GUC) feature (similar to synchronized_standby_slots). It could be a different thread on its own.
>
>
> +1 on the overall idea of the patch.
> I understand the concern raised above that one of the standbys in the
> quorum (synchronized_standby_slots) might lag behind the logical
> replica, and a user could potentially failover to such a standby. But
> I also agree with Amit that configuring failover correctly is
> ultimately the responsibility of failover-solution. And instructions
> in doc should be followed before deciding if a standby is
> failover-ready or not.
>
> As suggested in [1], IMO, it is a reasonably good idea for
> 'synchronized_standby_slots' to DEFAULT to the value of
> 'synchronous_standby_names'. That way, even if the user missed to
> configure 'synchronized_standby_slots' explicitly, we would still have
> reasonable protection in place. At the same time, if a user
> intentionally chooses not to configure it, a NULL/NONE value should
> remain a valid option.
>

AFAIU, not all names listed in "synchronous_standby_names" are
necessarily synchronous standbys. Tools like pg_receivewal, for
example, can establish a replication connection to the primary and
appear in that list. Therefore, deriving "synchronized_standby_slots"
from "synchronous_standby_names", if not set by the user would cause
logical slots to be synchronized to whatever nodes those names
represent, including a host running pg_receivewal, which is certainly
not something the user would have intended to do. Therefore I feel
this might not just be the good choice.

--
With Regards,
Ashutosh Sharma.






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
  2026-02-26 09:11         ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
@ 2026-02-26 10:46           ` SATYANARAYANA NARLAPURAM <[email protected]>
  2026-05-21 09:12             ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  0 siblings, 1 reply; 11+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-02-26 10:46 UTC (permalink / raw)
  To: Ashutosh Sharma <[email protected]>; +Cc: shveta malik <[email protected]>; Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi Ashutosh,

On Thu, Feb 26, 2026 at 1:11 AM Ashutosh Sharma <[email protected]>
wrote:

> Hi,
>
> On Thu, Feb 26, 2026 at 2:15 PM shveta malik <[email protected]>
> wrote:
> >
> > On Thu, Feb 26, 2026 at 1:54 PM SATYANARAYANA NARLAPURAM
> > <[email protected]> wrote:
> > >
> > > Hi Ashutosh,
> > >
> > > On Wed, Feb 25, 2026 at 11:42 PM Ashutosh Sharma <
> [email protected]> wrote:
> > >>
> > >>
> > >> I don't think we should be comparing "synchronous_standby_names" with
> > >> "synchronized_standby_slots", even though they appear similar in
> > >> purpose. All values listed in synchronous_standby_names represent
> > >> synchronous standbys exclusively, whereas synchronized_standby_slots
> > >> can hold values for both synchronous and asynchronous standbys. In
> > >> other words, every server referenced by synchronous_standby_names is
> > >> of the same type, but that may not be the case with
> > >> synchronized_standby_slots.
> > >>
> > >> If a GUC can hold values of different types (sync vs. async), does it
> > >> really make sense to use a qualifier like ANY 1 (val1, val2) when val1
> > >> and val2 are different in nature? For example, suppose val1 is a
> > >> synchronous standby and val2 is an asynchronous standby, and we
> > >> configure ANY 1 (val1, val2). It's possible for val2 to get ahead of
> > >> val1 in terms of replication progress, which in turn could mean the
> > >> logical replica is also ahead of val1. So if we were to fail over to
> > >> val1 (since it's the only synchronous standby), we will not be able to
> > >> use the existing logical replication setup.
> > >
> > >
> > > If the failover orchestrator cannot ensure standby1 to not get the
> quorum committed WAL (from archive or standby2) then the setting ANY 1
> (val1, val2) is invalid.
> > > This setup also has issues because in your scenario, standby2 is ahead
> of the new primary (standby1) and standby2 requires now to rewind to be in
> sync with the new primary. Additionally, it allowed readers to read data
> that was lost at the end of the failover. We ideally need a mechanism to
> not send WAL to async replicas before the sync replicas commit  (honoring
> syncrhnous_standby_names GUC) feature (similar to
> synchronized_standby_slots). It could be a different thread on its own.
> >
> >
> > +1 on the overall idea of the patch.
> > I understand the concern raised above that one of the standbys in the
> > quorum (synchronized_standby_slots) might lag behind the logical
> > replica, and a user could potentially failover to such a standby. But
> > I also agree with Amit that configuring failover correctly is
> > ultimately the responsibility of failover-solution. And instructions
> > in doc should be followed before deciding if a standby is
> > failover-ready or not.
> >
> > As suggested in [1], IMO, it is a reasonably good idea for
> > 'synchronized_standby_slots' to DEFAULT to the value of
> > 'synchronous_standby_names'. That way, even if the user missed to
> > configure 'synchronized_standby_slots' explicitly, we would still have
> > reasonable protection in place. At the same time, if a user
> > intentionally chooses not to configure it, a NULL/NONE value should
> > remain a valid option.
> >
>
> AFAIU, not all names listed in "synchronous_standby_names" are
> necessarily synchronous standbys. Tools like pg_receivewal, for
> example, can establish a replication connection to the primary and
> appear in that list. Therefore, deriving "synchronized_standby_slots"
> from "synchronous_standby_names", if not set by the user would cause
> logical slots to be synchronized to whatever nodes those names
> represent, including a host running pg_receivewal, which is certainly
> not something the user would have intended to do. Therefore I feel
> this might not just be the good choice.


Agreed, not a good idea to have  synchronized_standby_slots default to
synchronous_standby_names because application_names and slot names are
different as stated.

Thanks,
Satya


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
  2026-02-26 09:11         ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 10:46           ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
@ 2026-05-21 09:12             ` Ashutosh Sharma <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: Ashutosh Sharma @ 2026-05-21 09:12 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: Amit Kapila <[email protected]>; Hou, Zhijie/侯 志杰 <[email protected]>; Ajin Cherian <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>; shveta malik <[email protected]>

Hi Shveta,

On Fri, May 15, 2026 at 9:28 AM shveta malik <[email protected]> wrote:
>
> Ashutosh, while testing further, I noticed that
> 'synchronized_standby_slots' does not filter duplicate entries. As an
> example, if user ends up giving one entry twice in priority
> configuration, then we will end up waiting on one slot twice rather
> than waiting on 2 different slots.
>

Thank you for raising this concern. It is indeed an issue that needs
fixing. We will ensure it is addressed in the next patch version.

--
With Regards,
Ashutosh Sharma.






^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
@ 2026-02-26 09:29         ` Alexander Kukushkin <[email protected]>
  2026-02-26 10:38           ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  1 sibling, 1 reply; 11+ messages in thread

From: Alexander Kukushkin @ 2026-02-26 09:29 UTC (permalink / raw)
  To: shveta malik <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; Ashutosh Sharma <[email protected]>; Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi,

On Thu, 26 Feb 2026 at 09:45, shveta malik <[email protected]> wrote:

>
> As suggested in [1], IMO, it is a reasonably good idea for
> 'synchronized_standby_slots' to DEFAULT to the value of
> 'synchronous_standby_names'. That way, even if the user missed to
> configure 'synchronized_standby_slots' explicitly, we would still have
> reasonable protection in place.


Hmm.
synchronous_standby_names contains application_names,
while synchronized_standby_slots contains names of physical replication
slots.
These are two different things, and in fact sync replication doesn't even
require to use replication slots.
What is worse, even when all standbys use physical replication slots there
is no guarantee that values in synchronous_standby_names will match
physical slot names.

Regards,
--
Alexander Kukushkin


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
  2026-02-26 07:42   ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 08:23     ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication SATYANARAYANA NARLAPURAM <[email protected]>
  2026-02-26 08:45       ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication shveta malik <[email protected]>
  2026-02-26 09:29         ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Alexander Kukushkin <[email protected]>
@ 2026-02-26 10:38           ` SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-02-26 10:38 UTC (permalink / raw)
  To: Alexander Kukushkin <[email protected]>; +Cc: shveta malik <[email protected]>; Ashutosh Sharma <[email protected]>; Amit Kapila <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi Alexnader,

On Thu, Feb 26, 2026 at 1:29 AM Alexander Kukushkin <[email protected]>
wrote:

> Hi,
>
> On Thu, 26 Feb 2026 at 09:45, shveta malik <[email protected]> wrote:
>
>>
>> As suggested in [1], IMO, it is a reasonably good idea for
>> 'synchronized_standby_slots' to DEFAULT to the value of
>> 'synchronous_standby_names'. That way, even if the user missed to
>> configure 'synchronized_standby_slots' explicitly, we would still have
>> reasonable protection in place.
>
>
> Hmm.
> synchronous_standby_names contains application_names,
> while synchronized_standby_slots contains names of physical replication
> slots.
> These are two different things, and in fact sync replication doesn't even
> require to use replication slots.
> What is worse, even when all standbys use physical replication slots there
> is no guarantee that values in synchronous_standby_names will match
> physical slot names
>

That's right, thanks for reminding me. I am convinced that we can't use the
defaults of synchronous_standby_names for synchronized_standby_slots. What
do you think about the rest of the proposal?

Thanks,
Satya


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
  2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
  2026-02-26 06:19 ` Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Amit Kapila <[email protected]>
@ 2026-02-26 08:02   ` SATYANARAYANA NARLAPURAM <[email protected]>
  1 sibling, 0 replies; 11+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2026-02-26 08:02 UTC (permalink / raw)
  To: Amit Kapila <[email protected]>; +Cc: Ashutosh Sharma <[email protected]>; pgsql-hackers; PostgreSQL Hackers <[email protected]>

Hi Amit,

On Wed, Feb 25, 2026 at 10:20 PM Amit Kapila <[email protected]>
wrote:

> ...
> >
> > Thinking about this further, using quorum settings for
> > synchronized_standby_slots can/will certainly result in at least one
> > sync standby lagging behind the logical replica, making it probably
> > impossible to continue with the existing logical replication setup
> > after a failover to the standby that lags behind. Here is what I am
> > mean:
> >
>
> But won't that be true even for synchronous_standby_names? I think in
> the case of quorum, it is the responsibility of the failover solution
> to select the most recent synced standby among all the standby's
> specified in synchronous_standby_names. Similarly here before failing
> over logical subscriber to one of physical standby, the failover tool
> needs to ensure it is switching over to the synced replica. We have
> given steps in the docs [1] that could be used to identify the replica
> where the subscriber can switchover. Will that address your concern?
>

+1, the job of failover orchestration is to ensure the new primary is
caught up at least until the quorum LSN. Otherwise, it can be a durability
issue where users see missing committed transactions.



> BTW, I have also suggested this idea in thread [2]. I don't recall all
> the ideas/points discussed in that thread but it would be good to
> check that thread for any alternative ideas and points raised, so that
> we don't miss anything.
>

Thanks for sharing the links,  the approach is similar. DEFAULT to
SAME_AS_SYNCREP_STANDBYS  is an interesting option.
I like the idea of avoiding duplicate lists unless the user wants to
maintain a separate list.

Thanks,
Satya


^ permalink  raw  reply  [nested|flat] 11+ messages in thread


end of thread, other threads:[~2026-05-21 09:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-26 04:58 Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication Ashutosh Sharma <[email protected]>
2026-02-26 06:19 ` Amit Kapila <[email protected]>
2026-02-26 07:42   ` Ashutosh Sharma <[email protected]>
2026-02-26 08:23     ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-02-26 08:45       ` shveta malik <[email protected]>
2026-02-26 09:11         ` Ashutosh Sharma <[email protected]>
2026-02-26 10:46           ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-05-21 09:12             ` Ashutosh Sharma <[email protected]>
2026-02-26 09:29         ` Alexander Kukushkin <[email protected]>
2026-02-26 10:38           ` SATYANARAYANA NARLAPURAM <[email protected]>
2026-02-26 08:02   ` SATYANARAYANA NARLAPURAM <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox