Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

public inbox for [email protected]  
help / color / mirror / Atom feed

From: SATYANARAYANA NARLAPURAM <[email protected]>
To: Bharath Rupireddy <[email protected]>
Cc: Bruce Momjian <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Kyotaro Horiguchi <[email protected]>
Cc: Laurenz Albe <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Date: Wed, 18 Mar 2026 00:28:44 -0700
Message-ID: <CAHg+QDdd7BXB9HD9ddevk_D5TtweEBantcvJ5up5hznryZ33_w@mail.gmail.com> (raw)
In-Reply-To: <CALj2ACVW1b7ue2qskO-Mef6975Mf3QZJs+47sHAgk8QB-bmDMA@mail.gmail.com>
References: <[email protected]>
	<CALj2ACWPMYoPSC3t-9uW+0gqDUcJf1mLww6hHzo2V2AvE-Tu+w@mail.gmail.com>
	<[email protected]>
	<CALj2ACXmMWtpmuT-=v8F+Lk4QCbdkeN+yHKXeRGKFfjG96YbKA@mail.gmail.com>
	<CALj2ACUO6oz-43ryqfMOVZ_Q-N10C5tkzKku12+QV02NnXsDrw@mail.gmail.com>
	<[email protected]>
	<CAAhFRxjFGSk-hVTjnpFwm1XBUcHL8Obugt=P+ixV5AD9H+Kkrw@mail.gmail.com>
	<CAAhFRxgcBy-UCvyJ1ZZ1UKf4Owrx4J2X1F4tN_FD=fh5wZgdkw@mail.gmail.com>
	<CALj2ACVG5KCoPD_5AF2_u07HuZe4ajaLWKycB6OBYsGuj67OhA@mail.gmail.com>
	<CAHg+QDf9sMJ-r9JqFQTALRy8dX8Mr6SoFEvXx8V-Tto10VcFPA@mail.gmail.com>
	<[email protected]>
	<CAHg+QDcw6er6maVHq5PGR2t7NePXMWD9Vih9TRE=8Z=LDxOE9w@mail.gmail.com>
	<CAHg+QDdz+7H8wuNk9A6nBFei46BWhBq1A37_O4kNmJpuSq+dUg@mail.gmail.com>
	<CALj2ACVW1b7ue2qskO-Mef6975Mf3QZJs+47sHAgk8QB-bmDMA@mail.gmail.com>

Reviving this thread.

On Sun, Jan 29, 2023 at 9:55 PM Bharath Rupireddy <
[email protected]> wrote:

> For proc die, it looks like the suggestion was to process it
> immediately and upon next restart, don't allow user connections unless
> all sync standbys were caught up. However, we need to be able to allow
> replication connections from standbys so that they'll be able to
> stream the needed WAL and catch up with primary, allow superuser or
> users with pg_monitor role to connect to perform ALTER SYSTEM to
> remove the unresponsive sync standbys if any from the list or disable
> sync replication altogether or monitor for flush lsn/catch up status.
> And block all other connections. Note that replication, superuser and
> users with pg_monitor role connections are allowed only after the
> server reaches a consistent state not before that to not read any
> inconsistent data.
>

Allowing replication, superuser and pg_monitor seems reasonable to me.


>
> The trickiest part of doing the above is how we detect upon restart
> that the server received proc die while waiting for sync replication
> ACK. One idea might be to set a flag in the control file before the
> crash. Second idea might be to write a marker file (although I don't
> favor this idea); presence indicates that the server was waiting for
> sync replication ACK before the crash. However, we may not detect all
> sorts of crashes in a backend when it is waiting for sync replication
> ACK to do any of these two ideas. Therefore, this may not be a
> complete solution.
>

You cannot control the crash, it can be a simple power failure too and none
of them could have reached the disk.
Additionally, this is in a critical transaction commit path.


>
> Third idea might be to just let the primary wait for sync standbys to
> catch up upon restart irrespective of whether it was crashed or not
> while waiting for sync replication ACK. While this idea works well
> without having to detect all sorts of crashes, the primary may not
> come up if any unresponsive standbys are present (currently, the
> primary continues to be operational for read-only queries at least
> irrespective of whether sync standbys have caught up or not).
>

I prefer this approach because depending on the quorum policy defined in
the synchrnous_standby_names, the primary will open connections for
read/writes.
If there is no progress from sync standbys then Postgres admin has to jump
in regardless.

Thanks,
Satya

view thread (37+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
  In-Reply-To: <CAHg+QDdd7BXB9HD9ddevk_D5TtweEBantcvJ5up5hznryZ33_w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox