public inbox for [email protected]
help / color / mirror / Atom feedFrom: Ashutosh Sharma <[email protected]>
To: SATYANARAYANA NARLAPURAM <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Date: Wed, 25 Feb 2026 19:21:17 +0530
Message-ID: <CAE9k0PnP0cPuisVeXM+Bma7n6J+HYqhVO5LffosXuHSw7drEDQ@mail.gmail.com> (raw)
In-Reply-To: <CAHg+QDfU7rOebrLDESPpHSgdiadKbpCOmBokcbmM6Gr+A5VobQ@mail.gmail.com>
References: <CAHg+QDfU7rOebrLDESPpHSgdiadKbpCOmBokcbmM6Gr+A5VobQ@mail.gmail.com>
Hi Satya,
On Wed, Feb 25, 2026 at 3:38 AM SATYANARAYANA NARLAPURAM
<[email protected]> wrote:
>
>
> Hi hackers,
>
> synchronized_standby_slots requires that every physical slot listed in the GUC has caught up before a logical failover slot is allowed to proceed with decoding. This is an ALL-of-N slots semantic. The logical slot availability model does not align with quorum replication semantics set using synchronous_standby_names which can be configured for quorum commit (ANY M of N).
>
> In a typical 3 Node HA deployment with quorum sync rep:
>
> Primary, standby1 (corresponds to sb1_slot), standby2 (corresponds to sb2_slot)
> synchronized_standby_slots = ' sb1_slot, sb2_slot'
> synchronous_standby_names = 'Any 1 ('standby1','standby2')'
>
> If standby1 goes down, synchronous commits still succeed because standby2 satisfies the quorum. However, logical decoding blocks indefinitely in WaitForStandbyConfirmation(), waiting for sb1_slot (corresponds to standby1) to catch up — even though the transaction is already safely committed on a quorum of synchronous standbys. This blocks logical decoding consumers from progressing and is inconsistent with the availability guarantee the DBA intended by choosing quorum commit.
+1. This can indeed be a blocker for failover enabled logical
replication. It not only has the potential to disrupt logical
replication, but can also impact the primary server. Over time, it may
silently lead to significant WAL accumulation on the primary,
eventually causing disk-full scenarios and degrading the performance
of applications running on the primary instance. Therefore, I too
strongly believe this needs to be addressed to prevent such
potentially disruptive situations.
>
>
> Proposal:
>
> Make synchronized_standby_slots quorum aware i.e. extend the GUC to accept an ANY M (slot1, slot2, ...) syntax similar to synchronous_standby_names, so StandbySlotsHaveCaughtup() can return true when M of N slots (where M <= N and M >= 1) have caught up. I still prefer two different GUCs for this as the list of slots to be synchronized can still be different (for example, DBA may want to ensure Geo standby to be sync before allowing the logical decoding client to read the changes). I kept synchronized_standby_slots parse logic similar to synchronous_standby_names to keep things simple. The default behavior is also not changed for synchronized_standby_slots.
>
Thank you for the proposal. I can spend some time reviewing the
changes and help take this forward. I would also be happy to hear
others' thoughts and feedback on the proposal.
--
With Regards,
Ashutosh Sharma.
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
In-Reply-To: <CAE9k0PnP0cPuisVeXM+Bma7n6J+HYqhVO5LffosXuHSw7drEDQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox