Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
36+ messages / 8 participants
[nested] [flat]

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-04-26 06:26  Laurenz Albe <[email protected]>
  0 siblings, 2 replies; 36+ messages in thread

From: Laurenz Albe @ 2022-04-26 06:26 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; PostgreSQL Hackers <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote:
> With synchronous replication typically all the transactions (txns)
> first locally get committed, then streamed to the sync standbys and
> the backend that generated the transaction will wait for ack from sync
> standbys. While waiting for ack, it may happen that the query or the
> txn gets canceled (QueryCancelPending is true) or the waiting backend
> is asked to exit (ProcDiePending is true). In either of these cases,
> the wait for ack gets canceled and leaves the txn in an inconsistent
> state [...]
> 
> Here's a proposal (mentioned previously by Satya [1]) to avoid the
> above problems:
> 1) Wait a configurable amount of time before canceling the sync
> replication by the backends i.e. delay processing of
> QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
> synchronous_replication_naptime_before_cancel, when set, it will let
> the backends wait for the ack before canceling the synchronous
> replication so that the transaction can be available in sync standbys
> as well.
> 2) Wait for sync standbys to catch up upon restart after the crash or
> in the next txn after the old locally committed txn was canceled.

While this may mitigate the problem, I don't think it will deal with
all the cases which could cause a transaction to end up committed locally,
but not on the synchronous standby.  I think that only using the full
power of two-phase commit can make this bulletproof.

Is it worth adding additional complexity that is not a complete solution?

Yours,
Laurenz Albe







^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-09 09:20  Bharath Rupireddy <[email protected]>
  parent: Laurenz Albe <[email protected]>
  1 sibling, 2 replies; 36+ messages in thread

From: Bharath Rupireddy @ 2022-05-09 09:20 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <[email protected]> wrote:
>
> On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote:
> > With synchronous replication typically all the transactions (txns)
> > first locally get committed, then streamed to the sync standbys and
> > the backend that generated the transaction will wait for ack from sync
> > standbys. While waiting for ack, it may happen that the query or the
> > txn gets canceled (QueryCancelPending is true) or the waiting backend
> > is asked to exit (ProcDiePending is true). In either of these cases,
> > the wait for ack gets canceled and leaves the txn in an inconsistent
> > state [...]
> >
> > Here's a proposal (mentioned previously by Satya [1]) to avoid the
> > above problems:
> > 1) Wait a configurable amount of time before canceling the sync
> > replication by the backends i.e. delay processing of
> > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
> > synchronous_replication_naptime_before_cancel, when set, it will let
> > the backends wait for the ack before canceling the synchronous
> > replication so that the transaction can be available in sync standbys
> > as well.
> > 2) Wait for sync standbys to catch up upon restart after the crash or
> > in the next txn after the old locally committed txn was canceled.
>
> While this may mitigate the problem, I don't think it will deal with
> all the cases which could cause a transaction to end up committed locally,
> but not on the synchronous standby.  I think that only using the full
> power of two-phase commit can make this bulletproof.

Not sure if it's recommended to use 2PC in postgres HA with sync
replication where the documentation says that "PREPARE TRANSACTION"
and other 2PC commands are "intended for use by external transaction
management systems" and with explicit transactions. Whereas, the txns
within a postgres HA with sync replication always don't have to be
explicit txns. Am I missing something here?

> Is it worth adding additional complexity that is not a complete solution?

The proposed approach helps to avoid some common possible problems
that arise with simple scenarios (like cancelling a long running query
while in SyncRepWaitForLSN) within sync replication.

[1] https://www.postgresql.org/docs/devel/sql-prepare-transaction.html

Regards,
Bharath Rupireddy.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-09 09:44  Dilip Kumar <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 0 replies; 36+ messages in thread

From: Dilip Kumar @ 2022-05-09 09:44 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, May 9, 2022 at 2:50 PM Bharath Rupireddy
<[email protected]> wrote:
>
> On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <[email protected]> wrote:
> >
> > On Mon, 2022-04-25 at 19:51 +0530, Bharath Rupireddy wrote:
> > > With synchronous replication typically all the transactions (txns)
> > > first locally get committed, then streamed to the sync standbys and
> > > the backend that generated the transaction will wait for ack from sync
> > > standbys. While waiting for ack, it may happen that the query or the
> > > txn gets canceled (QueryCancelPending is true) or the waiting backend
> > > is asked to exit (ProcDiePending is true). In either of these cases,
> > > the wait for ack gets canceled and leaves the txn in an inconsistent
> > > state [...]
> > >
> > > Here's a proposal (mentioned previously by Satya [1]) to avoid the
> > > above problems:
> > > 1) Wait a configurable amount of time before canceling the sync
> > > replication by the backends i.e. delay processing of
> > > QueryCancelPending and ProcDiePending in Introduced a new timeout GUC
> > > synchronous_replication_naptime_before_cancel, when set, it will let
> > > the backends wait for the ack before canceling the synchronous
> > > replication so that the transaction can be available in sync standbys
> > > as well.
> > > 2) Wait for sync standbys to catch up upon restart after the crash or
> > > in the next txn after the old locally committed txn was canceled.
> >
> > While this may mitigate the problem, I don't think it will deal with
> > all the cases which could cause a transaction to end up committed locally,
> > but not on the synchronous standby.  I think that only using the full
> > power of two-phase commit can make this bulletproof.
>
> Not sure if it's recommended to use 2PC in postgres HA with sync
> replication where the documentation says that "PREPARE TRANSACTION"
> and other 2PC commands are "intended for use by external transaction
> management systems" and with explicit transactions. Whereas, the txns
> within a postgres HA with sync replication always don't have to be
> explicit txns. Am I missing something here?
>
> > Is it worth adding additional complexity that is not a complete solution?
>
> The proposed approach helps to avoid some common possible problems
> that arise with simple scenarios (like cancelling a long running query
> while in SyncRepWaitForLSN) within sync replication.

IMHO, making it wait for some amount of time, based on GUC is not a
complete solution.  It is just a hack to avoid the problem in some
cases.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-09 11:09  Andrey Borodin <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Andrey Borodin @ 2022-05-09 11:09 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

> On 9 May 2022, at 14:20, Bharath Rupireddy <[email protected]> wrote:
> 
> On Tue, Apr 26, 2022 at 11:57 AM Laurenz Albe <[email protected]> wrote:
>> 
>> While this may mitigate the problem, I don't think it will deal with
>> all the cases which could cause a transaction to end up committed locally,
>> but not on the synchronous standby.  I think that only using the full
>> power of two-phase commit can make this bulletproof.
> 
> Not sure if it's recommended to use 2PC in postgres HA with sync
> replication where the documentation says that "PREPARE TRANSACTION"
> and other 2PC commands are "intended for use by external transaction
> management systems" and with explicit transactions. Whereas, the txns
> within a postgres HA with sync replication always don't have to be
> explicit txns. Am I missing something here?

COMMIT PREPARED needs to be replicated as well, thus encountering the very same problem as usual COMMIT: if done during failover it can be canceled and committed data can be wrongfully reported durably written. 2PC is not a remedy to the fact that PG silently cancels awaiting of sync replication. The problem arise in presence of any "commit". And "commit" is there if transactions are there.

> On 9 May 2022, at 14:44, Dilip Kumar <[email protected]> wrote:
> 
> IMHO, making it wait for some amount of time, based on GUC is not a
> complete solution.  It is just a hack to avoid the problem in some
> cases.

Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously installed to make backend responsible to Ctrl+C (or client side statement timeout).

> On 26 Apr 2022, at 11:26, Laurenz Albe <[email protected]> wrote:
> 
> Is it worth adding additional complexity that is not a complete solution?

Its not additional complexity. It is removing additional complexity that made sync rep interruptible. (But I'm surely talking not about GUCs like synchronous_replication_naptime_before_cancel - wait of sync rep must be indefinite until synchrous_commit\synchronous_standby_names are satisfied )

And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.

Best regards, Andrey Borodin.

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-10 07:48  Dilip Kumar <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Dilip Kumar @ 2022-05-10 07:48 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <[email protected]> wrote:

> > On 9 May 2022, at 14:44, Dilip Kumar <[email protected]> wrote:
> >
> > IMHO, making it wait for some amount of time, based on GUC is not a
> > complete solution.  It is just a hack to avoid the problem in some
> > cases.
>
> Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously installed to make backend responsible to Ctrl+C (or client side statement timeout).

I might be missing something but based on my understanding the
approach is not disallowing the query cancellation but it is just
adding the configuration for how much to delay before canceling the
query.  That's the reason I mentioned that this is not a guarenteed
solution.  I mean with this configuration value also you can not avoid
problems in all the cases, right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-10 07:59  Bharath Rupireddy <[email protected]>
  parent: Dilip Kumar <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-05-10 07:59 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, May 10, 2022 at 1:18 PM Dilip Kumar <[email protected]> wrote:
>
> On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <[email protected]> wrote:
>
> > > On 9 May 2022, at 14:44, Dilip Kumar <[email protected]> wrote:
> > >
> > > IMHO, making it wait for some amount of time, based on GUC is not a
> > > complete solution.  It is just a hack to avoid the problem in some
> > > cases.
> >
> > Disallowing cancelation of locally committed transactions is not a hack. It's removing of a hack that was erroneously installed to make backend responsible to Ctrl+C (or client side statement timeout).
>
> I might be missing something but based on my understanding the
> approach is not disallowing the query cancellation but it is just
> adding the configuration for how much to delay before canceling the
> query.  That's the reason I mentioned that this is not a guarenteed
> solution.  I mean with this configuration value also you can not avoid
> problems in all the cases, right?

Yes Dilip, the proposed GUC in v1 patch doesn't allow waiting forever
for sync repl ack, in other words, doesn't allow blocking the pending
query cancels or proc die interrupts forever. The backends may linger
in case repl ack isn't received or sync replicas aren't reachable?
Users may have to set the GUC to a 'reasonable value'.

If okay, I can make the GUC behave this way - value 0 existing
behaviour i.e. no wait for sync repl ack, just process query cancels
and proc die interrupts immediately; value -1 wait unboundedly for the
ack; value > 0 wait for specified milliseconds for the ack.

Regards,
Bharath Rupireddy.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-10 12:25  Andrey Borodin <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 36+ messages in thread

From: Andrey Borodin @ 2022-05-10 12:25 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>



> 10 мая 2022 г., в 12:59, Bharath Rupireddy <[email protected]> написал(а):
> 
> If okay, I can make the GUC behave this way - value 0 existing
> behaviour i.e. no wait for sync repl ack, just process query cancels
> and proc die interrupts immediately; value -1 wait unboundedly for the
> ack; value > 0 wait for specified milliseconds for the ack.
+1 if we make -1 and 0 only valid values.

> query cancels or proc die interrupts

Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently.

When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels. Yet once upon a time you want to shutdown postgres without coredump - thus proc die needs to be processed.

Thanks!

Best regards, Andrey Borodin.




^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-05-13 12:09  Bharath Rupireddy <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 0 replies; 36+ messages in thread

From: Bharath Rupireddy @ 2022-05-13 12:09 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, May 10, 2022 at 5:55 PM Andrey Borodin <[email protected]> wrote:
>
> > 10 мая 2022 г., в 12:59, Bharath Rupireddy <[email protected]> написал(а):
> >
> > If okay, I can make the GUC behave this way - value 0 existing
> > behaviour i.e. no wait for sync repl ack, just process query cancels
> > and proc die interrupts immediately; value -1 wait unboundedly for the
> > ack; value > 0 wait for specified milliseconds for the ack.
> +1 if we make -1 and 0 only valid values.
>
> > query cancels or proc die interrupts
>
> Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently.

Agree.

> When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels.

When standby is promoted, no point the old primary waiting forever for
ack assuming we are going to discard it.

> Yet once upon a time you want to shutdown postgres without coredump - thus proc die needs to be processed.

I think users can still have the flexibility to set the right amounts
of time to process cancel and proc die interrupts.

IMHO, the GUC can still have 0, -1 and value > 0 in milliseconds, let
the users decide on what they want. Do you see any problems with this?

Thoughts?

Regards,
Bharath Rupireddy.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-07-25 09:29  Bharath Rupireddy <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-07-25 09:29 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, May 10, 2022 at 5:55 PM Andrey Borodin <[email protected]> wrote:
>
> > 10 мая 2022 г., в 12:59, Bharath Rupireddy <[email protected]> написал(а):
> >
> > If okay, I can make the GUC behave this way - value 0 existing
> > behaviour i.e. no wait for sync repl ack, just process query cancels
> > and proc die interrupts immediately; value -1 wait unboundedly for the
> > ack; value > 0 wait for specified milliseconds for the ack.
> +1 if we make -1 and 0 only valid values.
>
> > query cancels or proc die interrupts
>
> Please note, that typical HA tool would need to handle query cancels and proc die interrupts differently.

Hm, after thinking for a while, I tend to agree with the above
approach - meaning, query cancel interrupt processing can completely
be disabled in SyncRepWaitForLSN() and process proc die interrupt
immediately, this approach requires no GUC as opposed to the proposed
v1 patch upthread.

However, it's good to see what other hackers think about this.

> When the network is partitioned and somewhere standby is promoted you definitely want infinite wait for cancels. Yet once upon a time you want to shutdown postgres without coredump - thus proc die needs to be processed.

Agree.

On Mon, May 9, 2022 at 4:39 PM Andrey Borodin <[email protected]> wrote:
>
> > On 26 Apr 2022, at 11:26, Laurenz Albe <[email protected]> wrote:
> >
> > Is it worth adding additional complexity that is not a complete solution?
>
> Its not additional complexity. It is removing additional complexity that made sync rep interruptible. (But I'm surely talking not about GUCs like synchronous_replication_naptime_before_cancel - wait of sync rep must be indefinite until synchrous_commit\synchronous_standby_names are satisfied )
>
> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.

Hm, that needs to be done anyways. How about doing as proposed
initially upthread [1]? Also, quoting the idea here [2].

Thoughts?

[1] https://www.postgresql.org/message-id/[email protected]...
[2] 2) Wait for sync standbys to catch up upon restart after the crash or
in the next txn after the old locally committed txn was canceled. One
way to achieve this is to let the backend, that's making the first
connection, wait for sync standbys to catch up in ClientAuthentication
right after successful authentication. However, I'm not sure this is
the best way to do it at this point.

Regards,
Bharath Rupireddy.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-07-25 10:50  Andrey Borodin <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Andrey Borodin @ 2022-07-25 10:50 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>



> 25 июля 2022 г., в 14:29, Bharath Rupireddy <[email protected]> написал(а):
> 
> Hm, after thinking for a while, I tend to agree with the above
> approach - meaning, query cancel interrupt processing can completely
> be disabled in SyncRepWaitForLSN() and process proc die interrupt
> immediately, this approach requires no GUC as opposed to the proposed
> v1 patch upthread.
GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is fine too. If we do not allow cancelation of unreplicated backends, of course.


>> 
>> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.
> 
> Hm, that needs to be done anyways. How about doing as proposed
> initially upthread [1]? Also, quoting the idea here [2].
> 
> Thoughts?
> 
> [1] https://www.postgresql.org/message-id/[email protected]...
> [2] 2) Wait for sync standbys to catch up upon restart after the crash or
> in the next txn after the old locally committed txn was canceled. One
> way to achieve this is to let the backend, that's making the first
> connection, wait for sync standbys to catch up in ClientAuthentication
> right after successful authentication. However, I'm not sure this is
> the best way to do it at this point.


I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicated to quorum al least up until new timeline LSN.

Thanks!

[0] https://commitfest.postgresql.org/34/2402/






^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-04 08:12  Bharath Rupireddy <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-08-04 08:12 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <[email protected]> wrote:
>
> > 25 июля 2022 г., в 14:29, Bharath Rupireddy <[email protected]> написал(а):
> >
> > Hm, after thinking for a while, I tend to agree with the above
> > approach - meaning, query cancel interrupt processing can completely
> > be disabled in SyncRepWaitForLSN() and process proc die interrupt
> > immediately, this approach requires no GUC as opposed to the proposed
> > v1 patch upthread.
> GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is fine too. If we do not allow cancelation of unreplicated backends, of course.
>
> >>
> >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.
> >
> > Hm, that needs to be done anyways. How about doing as proposed
> > initially upthread [1]? Also, quoting the idea here [2].
> >
> > Thoughts?
> >
> > [1] https://www.postgresql.org/message-id/[email protected]...
> > [2] 2) Wait for sync standbys to catch up upon restart after the crash or
> > in the next txn after the old locally committed txn was canceled. One
> > way to achieve this is to let the backend, that's making the first
> > connection, wait for sync standbys to catch up in ClientAuthentication
> > right after successful authentication. However, I'm not sure this is
> > the best way to do it at this point.
>
>
> I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicated to quorum al least up until new timeline LSN.

We can't do it in CheckRecoveryConsistency() unless I'm missing
something. Because, the walsenders (required for sending the remaining
WAL to sync standbys to achieve quorum) can only be started after the
server reaches a consistent state, after all walsenders are
specialized backends.



-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-04 08:31  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 0 replies; 36+ messages in thread

From: Bharath Rupireddy @ 2022-08-04 08:31 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; PostgreSQL Hackers <[email protected]>; Dilip Kumar <[email protected]>; Laurenz Albe <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Thu, Aug 4, 2022 at 1:42 PM Bharath Rupireddy
<[email protected]> wrote:
>
> On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <[email protected]> wrote:
> >
> > > 25 июля 2022 г., в 14:29, Bharath Rupireddy <[email protected]> написал(а):
> > >
> > > Hm, after thinking for a while, I tend to agree with the above
> > > approach - meaning, query cancel interrupt processing can completely
> > > be disabled in SyncRepWaitForLSN() and process proc die interrupt
> > > immediately, this approach requires no GUC as opposed to the proposed
> > > v1 patch upthread.
> > GUC was proposed here[0] to maintain compatibility with previous behaviour. But I think that having no GUC here is fine too. If we do not allow cancelation of unreplicated backends, of course.
> >
> > >>
> > >> And yes, we need additional complexity - but in some other place. Transaction can also be locally committed in presence of a server crash. But this another difficult problem. Crashed server must not allow data queries until LSN of timeline end is successfully replicated to synchronous_standby_names.
> > >
> > > Hm, that needs to be done anyways. How about doing as proposed
> > > initially upthread [1]? Also, quoting the idea here [2].
> > >
> > > Thoughts?
> > >
> > > [1] https://www.postgresql.org/message-id/[email protected]...
> > > [2] 2) Wait for sync standbys to catch up upon restart after the crash or
> > > in the next txn after the old locally committed txn was canceled. One
> > > way to achieve this is to let the backend, that's making the first
> > > connection, wait for sync standbys to catch up in ClientAuthentication
> > > right after successful authentication. However, I'm not sure this is
> > > the best way to do it at this point.
> >
> >
> > I think ideally startup process should not allow read only connections in CheckRecoveryConsistency() until WAL is not replicated to quorum al least up until new timeline LSN.
>
> We can't do it in CheckRecoveryConsistency() unless I'm missing
> something. Because, the walsenders (required for sending the remaining
> WAL to sync standbys to achieve quorum) can only be started after the
> server reaches a consistent state, after all walsenders are
> specialized backends.

Continuing on the above thought (I inadvertently clicked the send
button previously): A simple approach would be to check for quorum in
PostgresMain() before entering the query loop for (;;) for
non-walsender cases. A disadvantage of this would be that all the
backends will be waiting here in the worst case if it takes time for
achieving the sync quorum after restart -  roughly we can do the
following in PostgresMain(), of course we need locking mechanism so
that all the backends whoever reaches here will wait for the same lsn:

if (sync_replicaion_defined == true &&
shmem->wait_for_sync_repl_upon_restart == true)
{
      SyncRepWaitForLSN(pg_current_wal_flush_lsn(), false);
      shmem->wait_for_sync_repl_upon_restart = false;
}

Thoughts?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-05 02:49  Kyotaro Horiguchi <[email protected]>
  parent: Laurenz Albe <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Kyotaro Horiguchi @ 2022-08-05 02:49 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; [email protected]; [email protected]

At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <[email protected]> wrote in 
> While this may mitigate the problem, I don't think it will deal with
> all the cases which could cause a transaction to end up committed locally,
> but not on the synchronous standby.  I think that only using the full
> power of two-phase commit can make this bulletproof.
> 
> Is it worth adding additional complexity that is not a complete solution?

I would agree to this. Likewise 2PC, whatever we do to make it
perfect, we're left with unresolvable problems at least for now.

Doesn't it meet your requirements if we have a means to know the last
transaction on the current session is locally committed or aborted?

We are already internally managing last committed LSN. I think we can
do the same thing about transaction abort and last inserted LSN and we
can expose them any way. This is way simpler than the (maybe)
uncompletable attempt to fill up the deep gap.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-08 13:43  Bharath Rupireddy <[email protected]>
  parent: Kyotaro Horiguchi <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-08-08 13:43 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Fri, Aug 5, 2022 at 8:19 AM Kyotaro Horiguchi
<[email protected]> wrote:
>
> At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <[email protected]> wrote in
> > While this may mitigate the problem, I don't think it will deal with
> > all the cases which could cause a transaction to end up committed locally,
> > but not on the synchronous standby.  I think that only using the full
> > power of two-phase commit can make this bulletproof.
> >
> > Is it worth adding additional complexity that is not a complete solution?
>
> I would agree to this. Likewise 2PC, whatever we do to make it
> perfect, we're left with unresolvable problems at least for now.
>
> Doesn't it meet your requirements if we have a means to know the last
> transaction on the current session is locally committed or aborted?
>
> We are already internally managing last committed LSN. I think we can
> do the same thing about transaction abort and last inserted LSN and we
> can expose them any way. This is way simpler than the (maybe)
> uncompletable attempt to fill up the deep gap.

There can be more txns that are
locally-committed-but-not-yet-replicated. Even if we have that
information stored somewhere, what do we do with it? Those txns are
committed from the client perspective but not committed from the
server's perspective.

Can you please explain more about your idea, I may be missing something?

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-09 07:12  Kyotaro Horiguchi <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Kyotaro Horiguchi @ 2022-08-09 07:12 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]; [email protected]; [email protected]

At Mon, 8 Aug 2022 19:13:25 +0530, Bharath Rupireddy <[email protected]> wrote in 
> On Fri, Aug 5, 2022 at 8:19 AM Kyotaro Horiguchi
> <[email protected]> wrote:
> >
> > At Tue, 26 Apr 2022 08:26:59 +0200, Laurenz Albe <[email protected]> wrote in
> > > While this may mitigate the problem, I don't think it will deal with
> > > all the cases which could cause a transaction to end up committed locally,
> > > but not on the synchronous standby.  I think that only using the full
> > > power of two-phase commit can make this bulletproof.
> > >
> > > Is it worth adding additional complexity that is not a complete solution?
> >
> > I would agree to this. Likewise 2PC, whatever we do to make it
> > perfect, we're left with unresolvable problems at least for now.
> >
> > Doesn't it meet your requirements if we have a means to know the last
> > transaction on the current session is locally committed or aborted?
> >
> > We are already internally managing last committed LSN. I think we can
> > do the same thing about transaction abort and last inserted LSN and we
> > can expose them any way. This is way simpler than the (maybe)
> > uncompletable attempt to fill up the deep gap.
> 
> There can be more txns that are
> locally-committed-but-not-yet-replicated. Even if we have that
> information stored somewhere, what do we do with it? Those txns are
> committed from the client perspective but not committed from the
> server's perspective.
> 
> Can you please explain more about your idea, I may be missing something?

(I'm not sure I understand the requirements here..)

I understand that it is about query cancellation.  In the case of
primary crash/termination, client cannot even know whether the commit
of the ongoing transaction, if any, has been recorded.  Anyway no way
other than to somehow confirm that the change by the transaction has
been actually made after restart.  I believe it is the standard
practice of the applications that work on HA clusters.

The same is true in the case of query cancellation since commit
response doesn't reach the client, too.  But even in this case if we
had functions/views that tells us the
last-committed/last-aborted/last-inserted LSN on a session, we can
know whether the last transaction has been committed along with the
commit LSN maybe more easily.

# In fact, I see those functions rather as a means to know whether a
# change by the last transaction on a session is available on some
# replica.

For example, the below heavily simplified pseudo code might display
how the fucntions (if they were functions) work.

  try {
    s.execute("INSERT ..");
    c.commit();
  } catch (Exception x) {
    c.commit();
    if (s.execute("SELECT pg_session_last_committed_lsn() = "
                           "pg_session_last_inserted_lsn()"))
    {
      /* the transaction has been locally committed */
      if (s.execute("SELECT replay_lsn >= pg_session_last_committed_lsn() "
                           "FROM pg_stat_replication where xxxx")
	   /* the commit has been replicated to xxx, LSN is known */
    } else {
      /* the transaction has not been locally committed */
      <retry?>
    }
  }


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-08-09 08:46  Bharath Rupireddy <[email protected]>
  parent: Kyotaro Horiguchi <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-08-09 08:46 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, Aug 9, 2022 at 12:42 PM Kyotaro Horiguchi
<[email protected]> wrote:
>
> > Can you please explain more about your idea, I may be missing something?
>
> (I'm not sure I understand the requirements here..)

I've explained the problem with the current HA setup with synchronous
replication upthread at [1]. Let me reiterate it here once again.

When a query is cancelled (a simple stroke of CTRL+C or
pg_cancel_backend() call) while the txn is waiting for ack in
SyncRepWaitForLSN(); for the client, the txn is actually committed
(locally-committed-but-not-yet-replicated to all of sync standbys)
like any other txn, a warning is emitted into server logs but it is of
no use for the client (think of client as applications). There can be
many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
can be issued on all of those sessions. The problem is that the
subsequent reads will then be able to read all of those
locally-committed-but-not-yet-replicated to all of sync standbys txns
data - this is what I call an inconsistency (can we call this a
read-after-write inconsistency?) because of lack of proper query
cancel handling. And if the sync standbys are down or unable to come
up for some reason, until then, the primary will be serving clients
with the inconsistent data. BTW, I found a report of this problem here
[2].

The solution proposed for the above problem is to just 'not honor
query cancels at all while waiting for ack in SyncRepWaitForLSN()'.

When a proc die is pending, then also, there can be
locally-committed-but-not-yet-replicated to all of sync standbys txns.
Typically, there are two choices for the clients 1) reuse the primary
instance after restart 2) failover to one of sync standbys. For case
(1), there might be read-after-write inconsistency as explained above.
For case (2), those txns might get lost completely if the failover
target sync standby or the new primary didn't receive them and the
other sync standbys that have received them are now ahead and need a
special treatment (run pg_rewind) for them to be able to connect to
new primary.

The solution proposed for case (1) of the above problem is to 'process
the ProcDiePending immediately and upon restart the first backend can
wait until the sync standbys are caught up to ensure no inconsistent
reads'.
The solution proposed for case (2) of the above problem is to 'either
run pg_rewind for the sync standbys that are ahead or use the idea
proposed at [3]'.

I hope the above explanation helps.

[1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%4...
[2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous...
[3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gma...

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-09-27 13:22  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-09-27 13:22 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy
<[email protected]> wrote:
>
> I've explained the problem with the current HA setup with synchronous
> replication upthread at [1]. Let me reiterate it here once again.
>
> When a query is cancelled (a simple stroke of CTRL+C or
> pg_cancel_backend() call) while the txn is waiting for ack in
> SyncRepWaitForLSN(); for the client, the txn is actually committed
> (locally-committed-but-not-yet-replicated to all of sync standbys)
> like any other txn, a warning is emitted into server logs but it is of
> no use for the client (think of client as applications). There can be
> many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
> can be issued on all of those sessions. The problem is that the
> subsequent reads will then be able to read all of those
> locally-committed-but-not-yet-replicated to all of sync standbys txns
> data - this is what I call an inconsistency (can we call this a
> read-after-write inconsistency?) because of lack of proper query
> cancel handling. And if the sync standbys are down or unable to come
> up for some reason, until then, the primary will be serving clients
> with the inconsistent data. BTW, I found a report of this problem here
> [2].
>
> The solution proposed for the above problem is to just 'not honor
> query cancels at all while waiting for ack in SyncRepWaitForLSN()'.
>
> When a proc die is pending, then also, there can be
> locally-committed-but-not-yet-replicated to all of sync standbys txns.
> Typically, there are two choices for the clients 1) reuse the primary
> instance after restart 2) failover to one of sync standbys. For case
> (1), there might be read-after-write inconsistency as explained above.
> For case (2), those txns might get lost completely if the failover
> target sync standby or the new primary didn't receive them and the
> other sync standbys that have received them are now ahead and need a
> special treatment (run pg_rewind) for them to be able to connect to
> new primary.
>
> The solution proposed for case (1) of the above problem is to 'process
> the ProcDiePending immediately and upon restart the first backend can
> wait until the sync standbys are caught up to ensure no inconsistent
> reads'.
> The solution proposed for case (2) of the above problem is to 'either
> run pg_rewind for the sync standbys that are ahead or use the idea
> proposed at [3]'.
>
> I hope the above explanation helps.
>
> [1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%4...
> [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous...
> [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gma...

I'm attaching the v2 patch rebased on the latest HEAD. Please note
that there are still some open points, I'm yet to find time to think
more about them. Meanwhile, I'm posting the v2 patch for making cfbot
happy. Any further thoughts on the overall design of the patch are
most welcome. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v2-0001-Wait-specified-amount-of-time-before-cancelling-s.patch (8.1K, 2-v2-0001-Wait-specified-amount-of-time-before-cancelling-s.patch)
  download | inline diff:
From be6734c83d72333cc4ec1b2be1f4d54b875b74c8 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 27 Sep 2022 12:55:34 +0000
Subject: [PATCH v2] Wait specified amount of time before cancelling sync
 replication

In PostgreSQL high availability setup with synchronous replication,
typically all the transactions first locally get committed, then
streamed to the synchronous standbys and the backends that generated
the transaction will wait for acknowledgement from synchronous
standbys. While waiting for acknowledgement, it may happen that the
query or the transaction gets canceled or the backend that's waiting
for acknowledgement is asked to exit. In either of these cases, the
wait for acknowledgement gets canceled and leaves transaction in an
inconsistent state as it got committed locally but not on the
standbys. When set the GUC synchronous_replication_naptime_before_cancel
introduced in this patch, it will let the backends wait for the
acknowledgement before canceling the wait for acknowledgement so
that the transaction can be available in synchronous standbys as
well.
---
 doc/src/sgml/config.sgml                      | 30 +++++++++++
 src/backend/replication/syncrep.c             | 50 +++++++++++++++++++
 src/backend/utils/misc/guc_tables.c           | 12 +++++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/include/replication/syncrep.h             |  3 ++
 5 files changed, 97 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d8848bc774..baeef49012 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4524,6 +4524,36 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-synchronous-replication-naptime-before-cancel" xreflabel="synchronous_replication_naptime_before_cancel">
+      <term><varname>synchronous_replication_naptime_before_cancel</varname> (<type>integer</type>)
+      <indexterm>
+       <primary><varname>synchronous_replication_naptime_before_cancel</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Specifies the amount of time in milliseconds to wait for synchronous
+        replication before cancelling. Default value is 0, a value of -1 or 0
+        disables this feature. In <productname>PostgreSQL</productname> high
+        availability setup with synchronous replication, typically all the
+        transactions first locally get committed, then streamed to the
+        synchronous standbys and the backends that generated the transaction
+        will wait for acknowledgement from synchronous standbys. While waiting
+        for acknowledgement, it may happen that the query or the transaction
+        gets canceled or the backend that's waiting for acknowledgement is
+        asked to exit. In either of these cases, the wait for acknowledgement
+        gets canceled and leaves transaction in an inconsistent state as it got
+        committed locally but not on the standbys. When set the
+        <varname>synchronous_replication_naptime_before_cancel</varname>
+        parameter, it will let the backends wait for the acknowledgement
+        before canceling the wait for acknowledgement so that the transaction
+        can be available in synchronous standbys as well. This parameter can
+        only be set in the <filename>postgresql.conf</filename> file or on the
+        server command line.
+        </para>
+      </listitem>
+     </varlistentry>
+
      </variablelist>
     </sect2>
 
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index e360d925b0..60b6a5e471 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -89,6 +89,7 @@
 
 /* User-settable parameters for sync rep */
 char	   *SyncRepStandbyNames;
+int			SyncRepNapTimeBeforeCancel = 0;
 
 #define SyncStandbysDefined() \
 	(SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0')
@@ -120,6 +121,7 @@ static void SyncRepGetNthLatestSyncRecPtr(XLogRecPtr *writePtr,
 static int	SyncRepGetStandbyPriority(void);
 static int	standby_priority_comparator(const void *a, const void *b);
 static int	cmp_lsn(const void *a, const void *b);
+static bool SyncRepNapBeforeCancel(void);
 
 #ifdef USE_ASSERT_CHECKING
 static bool SyncRepQueueIsOrderedByLSN(int mode);
@@ -131,6 +133,42 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
  * ===========================================================
  */
 
+/*
+ * Wait for synchronous replication before cancelling, if requested by user.
+ */
+static bool
+SyncRepNapBeforeCancel(void)
+{
+	int wait_time;
+
+	if (SyncRepNapTimeBeforeCancel <= 0)
+		return false;
+
+	ereport(WARNING,
+			(errmsg_plural("waiting %d millisecond for synchronous replication before cancelling",
+						   "waiting %d milliseconds for synchronous replication before cancelling",
+							SyncRepNapTimeBeforeCancel,
+							SyncRepNapTimeBeforeCancel)));
+
+	wait_time = SyncRepNapTimeBeforeCancel;
+
+	while (wait_time > 0)
+	{
+		/*
+		 * Wait in intervals of 1 millisecond so that we can frequently check
+		 * for the acknowledgement.
+		 */
+		pg_usleep(1 * 1000L);
+
+		wait_time--;
+
+		if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
+			return true;
+	}
+
+	return true;
+}
+
 /*
  * Wait for synchronous replication, if requested by user.
  *
@@ -264,6 +302,12 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		 */
 		if (ProcDiePending)
 		{
+			if (SyncRepNapBeforeCancel())
+			{
+				if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
+					break;
+			}
+
 			ereport(WARNING,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
 					 errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"),
@@ -281,6 +325,12 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 		 */
 		if (QueryCancelPending)
 		{
+			if (SyncRepNapBeforeCancel())
+			{
+				if (MyProc->syncRepState == SYNC_REP_WAIT_COMPLETE)
+					break;
+			}
+
 			QueryCancelPending = false;
 			ereport(WARNING,
 					(errmsg("canceling wait for synchronous replication due to user request"),
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ab3140ff3a..bb14e47c45 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2496,6 +2496,18 @@ struct config_int ConfigureNamesInt[] =
 		0, 0, 1000000,			/* see ComputeXidHorizons */
 		NULL, NULL, NULL
 	},
+
+	{
+		{"synchronous_replication_naptime_before_cancel", PGC_SIGHUP, REPLICATION_PRIMARY,
+			gettext_noop("Sets the amount of time to wait for synchronous replictaion before cancelling."),
+			gettext_noop("A value of -1 or 0 disables this feature."),
+			GUC_UNIT_MS
+		},
+		&SyncRepNapTimeBeforeCancel,
+		0, 0, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"vacuum_failsafe_age", PGC_USERSET, CLIENT_CONN_STATEMENT,
 			gettext_noop("Age at which VACUUM should trigger failsafe to avoid a wraparound outage."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2ae76e5cfb..7849e3c5b3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -324,6 +324,8 @@
 				# and comma-separated list of application_name
 				# from standby(s); '*' = all
 #vacuum_defer_cleanup_age = 0	# number of xacts by which cleanup is delayed
+#synchronous_replication_naptime_before_cancel = 0 # amount of time to wait for
+				# synchronous replictaion before cancelling; 0 or -1 disables
 
 # - Standby Servers -
 
diff --git a/src/include/replication/syncrep.h b/src/include/replication/syncrep.h
index 0f3b3a7955..9b668e9a61 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -80,6 +80,9 @@ extern PGDLLIMPORT char *syncrep_parse_error_msg;
 /* user-settable parameters for synchronous replication */
 extern PGDLLIMPORT char *SyncRepStandbyNames;
 
+/* user-settable nap time for synchronous replictaion before cancelling */
+extern PGDLLIMPORT int SyncRepNapTimeBeforeCancel;
+
 /* called by user backend */
 extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-09-29 22:53  Bruce Momjian <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 36+ messages in thread

From: Bruce Momjian @ 2022-09-29 22:53 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, Sep 27, 2022 at 06:52:21PM +0530, Bharath Rupireddy wrote:
> On Tue, Aug 9, 2022 at 2:16 PM Bharath Rupireddy
> <[email protected]> wrote:
> >
> > I've explained the problem with the current HA setup with synchronous
> > replication upthread at [1]. Let me reiterate it here once again.
> >
> > When a query is cancelled (a simple stroke of CTRL+C or
> > pg_cancel_backend() call) while the txn is waiting for ack in
> > SyncRepWaitForLSN(); for the client, the txn is actually committed
> > (locally-committed-but-not-yet-replicated to all of sync standbys)
> > like any other txn, a warning is emitted into server logs but it is of
> > no use for the client (think of client as applications). There can be
> > many such txns waiting for ack in SyncRepWaitForLSN() and query cancel
> > can be issued on all of those sessions. The problem is that the
> > subsequent reads will then be able to read all of those
> > locally-committed-but-not-yet-replicated to all of sync standbys txns
> > data - this is what I call an inconsistency (can we call this a
> > read-after-write inconsistency?) because of lack of proper query
> > cancel handling. And if the sync standbys are down or unable to come
> > up for some reason, until then, the primary will be serving clients
> > with the inconsistent data. BTW, I found a report of this problem here
> > [2].
> >
> > The solution proposed for the above problem is to just 'not honor
> > query cancels at all while waiting for ack in SyncRepWaitForLSN()'.
> >
> > When a proc die is pending, then also, there can be
> > locally-committed-but-not-yet-replicated to all of sync standbys txns.
> > Typically, there are two choices for the clients 1) reuse the primary
> > instance after restart 2) failover to one of sync standbys. For case
> > (1), there might be read-after-write inconsistency as explained above.
> > For case (2), those txns might get lost completely if the failover
> > target sync standby or the new primary didn't receive them and the
> > other sync standbys that have received them are now ahead and need a
> > special treatment (run pg_rewind) for them to be able to connect to
> > new primary.
> >
> > The solution proposed for case (1) of the above problem is to 'process
> > the ProcDiePending immediately and upon restart the first backend can
> > wait until the sync standbys are caught up to ensure no inconsistent
> > reads'.
> > The solution proposed for case (2) of the above problem is to 'either
> > run pg_rewind for the sync standbys that are ahead or use the idea
> > proposed at [3]'.
> >
> > I hope the above explanation helps.
> >
> > [1] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%4...
> > [2] https://stackoverflow.com/questions/42686097/how-to-disable-uncommited-reads-in-postgres-synchronous...
> > [3] https://www.postgresql.org/message-id/CALj2ACX-xO-ZenQt1MWazj0Z3ziSXBMr24N_X2c0dYysPQghrw%40mail.gma...
> 
> I'm attaching the v2 patch rebased on the latest HEAD. Please note
> that there are still some open points, I'm yet to find time to think
> more about them. Meanwhile, I'm posting the v2 patch for making cfbot
> happy. Any further thoughts on the overall design of the patch are
> most welcome. Thanks.
... 
> In PostgreSQL high availability setup with synchronous replication,
> typically all the transactions first locally get committed, then
> streamed to the synchronous standbys and the backends that generated
> the transaction will wait for acknowledgement from synchronous
> standbys. While waiting for acknowledgement, it may happen that the
> query or the transaction gets canceled or the backend that's waiting
> for acknowledgement is asked to exit. In either of these cases, the
> wait for acknowledgement gets canceled and leaves transaction in an
> inconsistent state as it got committed locally but not on the
> standbys. When set the GUC synchronous_replication_naptime_before_cancel
> introduced in this patch, it will let the backends wait for the
> acknowledgement before canceling the wait for acknowledgement so
> that the transaction can be available in synchronous standbys as
> well.

I don't think this patch is going in the right direction, and I think we
need to step back to see why.

First, let's see how synchronous replication works.  Each backend waits
for one or more synchronous replicas to acknowledge the WAL related to
its commit and then it marks the commit as done in PGPROC and then to
the client;  I wrote a blog about it:

	https://momjian.us/main/blogs/pgblog/2020.html#June_3_2020

So, what happens when an insufficient number of synchronous replicas
reply?  Sessions hang because the synchronous behavior cannot be
guaranteed.  We then _allow_ query cancel so the user or administrator
can get out of the hung sessions and perhaps modify
synchronous_standby_names.

What the proposed patch effectively does is to prevent the ability to
recovery the hung sessions or auto-continue the sessions if an
insufficient number of synchronous replicas respond.  So, in a way, it
is allowing for more strict and less strict behavior of
synchronous_standby_names.

However, I think trying to solve this at the session level is the wrong
approach.  If you set a timeout to continue stuck sessions, odds are the
timeout will be too long for each session and performance will be
unacceptable anyway, so you haven't gained much.  If you prevent
cancels, you effectively lock up the system with fewer options of
recovery.

I have always felt this has to be done at the server level, meaning when
a synchronous_standby_names replica is not responding after a certain
timeout, the administrator must be notified by calling a shell command
defined in a GUC and all sessions will ignore the replica.  This gives a
much more predictable and useful behavior than the one in the patch ---
we have discussed this approach many times on the email lists.

Once we have that, we can consider removing the cancel ability while
waiting for synchronous replicas (since we have the timeout) or make it
optional.  We can also consider how do notify the administrator during
query cancel (if we allow it), backend abrupt exit/crash, and if we
should allow users to specify a retry interval to resynchronize the
synchronous replicas.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson






^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-10-01 01:29  Bharath Rupireddy <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-10-01 01:29 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Fri, Sep 30, 2022 at 4:23 AM Bruce Momjian <[email protected]> wrote:
>
> I don't think this patch is going in the right direction, and I think we
> need to step back to see why.

Thanks a lot Bruce for providing insights.

> First, let's see how synchronous replication works.  Each backend waits
> for one or more synchronous replicas to acknowledge the WAL related to
> its commit and then it marks the commit as done in PGPROC and then to
> the client;  I wrote a blog about it:
>
>         https://momjian.us/main/blogs/pgblog/2020.html#June_3_2020

Great!

> So, what happens when an insufficient number of synchronous replicas
> reply?  Sessions hang because the synchronous behavior cannot be
> guaranteed.  We then _allow_ query cancel so the user or administrator
> can get out of the hung sessions and perhaps modify
> synchronous_standby_names.

Right.

> What the proposed patch effectively does is to prevent the ability to
> recovery the hung sessions or auto-continue the sessions if an
> insufficient number of synchronous replicas respond.  So, in a way, it
> is allowing for more strict and less strict behavior of
> synchronous_standby_names.

Yes. I do agree that it makes the sessions further wait and closes the
quick exit path that admins/users have when the problem occurs. But it
disallows users cancelling queries or terminating backends only to end
up in locally-committed-but-not-replicated transactions on a server
setup where sync standbys all are working fine. I agree that we don't
want to further wait on cancels or proc dies as it makes the system
more unresponsive [1].

> However, I think trying to solve this at the session level is the wrong
> approach.  If you set a timeout to continue stuck sessions, odds are the
> timeout will be too long for each session and performance will be
> unacceptable anyway, so you haven't gained much.  If you prevent
> cancels, you effectively lock up the system with fewer options of
> recovery.

Yes.

> I have always felt this has to be done at the server level, meaning when
> a synchronous_standby_names replica is not responding after a certain
> timeout, the administrator must be notified by calling a shell command
> defined in a GUC and all sessions will ignore the replica.  This gives a
> much more predictable and useful behavior than the one in the patch ---
> we have discussed this approach many times on the email lists.

IIUC, each walsender serving a sync standby will determine that the
sync standby isn't responding for a configurable amount of time (less
than wal_sender_timeout) and calls shell command to notify the admin
if there are any backends waiting for sync replication in
SyncRepWaitForLSN(). The shell command then provides the unresponsive
sync standby name at the bare minimum for the admin to ignore it as
sync standby/remove it from synchronous_standby_names to continue
further. This still requires manual intervention which is a problem if
running postgres server instances at scale. Also, having a new shell
command in any form may pose security risks. I'm not sure at this
point how this new timeout is going to work alongside
wal_sender_timeout.

I'm thinking about the possible options that an admin has to get out
of this situation:
1) Removing the standby from synchronous_standby_names.
2) Fixing the sync standby, by restarting or restoring the lost part
(such as network or some other).

(1) is something that postgres can help admins get out of the problem
easily and automatically without any intervention. (2) is something
postgres can't do much about.

How about we let postgres automatically remove an unresponsive (for a
pre-configured time) sync standby from synchronous_standby_names and
inform the user (via log message and via new walsender property and
pg_stat_replication for monitoring purposes)? The users can then
detect such standbys and later try to bring them back to the sync
standbys group or do other things. I believe that a production level
postgres HA with sync standbys will have monitoring to detect the
replication lag, failover decision etc via monitoring
pg_stat_replication. With this approach, a bit more monitoring is
needed. This solution requires less or no manual intervention and
scales well. Please note that I haven't studied the possibilities of
implementing it yet.

Thoughts?

> Once we have that, we can consider removing the cancel ability while
> waiting for synchronous replicas (since we have the timeout) or make it
> optional.  We can also consider how do notify the administrator during
> query cancel (if we allow it), backend abrupt exit/crash, and

Yeah. If we have the
timeout-and-auto-removal-of-standby-from-sync-standbys-list solution,
the users can then choose to disable processing query cancels/proc
dies while waiting for sync replication in SyncRepWaitForLSN().

> if we
> should allow users to specify a retry interval to resynchronize the
> synchronous replicas.

This is another interesting thing to consider if we were to make the
auto-removed (by the above approach) standby a sync standby again
without manual intervention.

Thoughts?

[1] https://www.postgresql.org/message-id/CA%2BTgmoaCBwgMDkeBDOgtPgHcbfSYq%2BzORjL5DoU3pJyjALxtoQ%40mail...

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-10-05 21:00  Bruce Momjian <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bruce Momjian @ 2022-10-05 21:00 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Sat, Oct  1, 2022 at 06:59:26AM +0530, Bharath Rupireddy wrote:
> > I have always felt this has to be done at the server level, meaning when
> > a synchronous_standby_names replica is not responding after a certain
> > timeout, the administrator must be notified by calling a shell command
> > defined in a GUC and all sessions will ignore the replica.  This gives a
                         ------------------------------------
> > much more predictable and useful behavior than the one in the patch ---
> > we have discussed this approach many times on the email lists.
> 
> IIUC, each walsender serving a sync standby will determine that the
> sync standby isn't responding for a configurable amount of time (less
> than wal_sender_timeout) and calls shell command to notify the admin
> if there are any backends waiting for sync replication in
> SyncRepWaitForLSN(). The shell command then provides the unresponsive
> sync standby name at the bare minimum for the admin to ignore it as
> sync standby/remove it from synchronous_standby_names to continue
> further. This still requires manual intervention which is a problem if
> running postgres server instances at scale. Also, having a new shell

As I highlighted above, by default you notify the administrator that a
sychronous replica is not responding and then ignore it.  If it becomes
responsive again, you notify the administrator again and add it back as
a sychronous replica.

> command in any form may pose security risks. I'm not sure at this
> point how this new timeout is going to work alongside
> wal_sender_timeout.

We have archive_command, so I don't see a problem with another shell
command.

> I'm thinking about the possible options that an admin has to get out
> of this situation:
> 1) Removing the standby from synchronous_standby_names.

Yes, see above.  We might need a read-only GUC that reports which
sychronous replicas are active.  As you can see, there is a lot of API
design required here, but this is the most effective approach.

> 2) Fixing the sync standby, by restarting or restoring the lost part
> (such as network or some other).
> 
> (1) is something that postgres can help admins get out of the problem
> easily and automatically without any intervention. (2) is something
> postgres can't do much about.
> 
> How about we let postgres automatically remove an unresponsive (for a
> pre-configured time) sync standby from synchronous_standby_names and
> inform the user (via log message and via new walsender property and
> pg_stat_replication for monitoring purposes)? The users can then
> detect such standbys and later try to bring them back to the sync
> standbys group or do other things. I believe that a production level
> postgres HA with sync standbys will have monitoring to detect the
> replication lag, failover decision etc via monitoring
> pg_stat_replication. With this approach, a bit more monitoring is
> needed. This solution requires less or no manual intervention and
> scales well. Please note that I haven't studied the possibilities of
> implementing it yet.
> 
> Thoughts?

Yes, see above.

> > Once we have that, we can consider removing the cancel ability while
> > waiting for synchronous replicas (since we have the timeout) or make it
> > optional.  We can also consider how do notify the administrator during
> > query cancel (if we allow it), backend abrupt exit/crash, and
> 
> Yeah. If we have the
> timeout-and-auto-removal-of-standby-from-sync-standbys-list solution,
> the users can then choose to disable processing query cancels/proc
> dies while waiting for sync replication in SyncRepWaitForLSN().

Yes.  We might also change things so a query cancel that happens during 
sychronous replica waiting can only be done by an administrator, not the
session owner.  Again, lots of design needed here.

> > if we
> > should allow users to specify a retry interval to resynchronize the
> > synchronous replicas.
> 
> This is another interesting thing to consider if we were to make the
> auto-removed (by the above approach) standby a sync standby again
> without manual intervention.

Yes, see above.  You are addressing the right questions here.  :-)

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson






^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-10-06 08:03  Bharath Rupireddy <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Bharath Rupireddy @ 2022-10-06 08:03 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Thu, Oct 6, 2022 at 2:30 AM Bruce Momjian <[email protected]> wrote:
>
> As I highlighted above, by default you notify the administrator that a
> sychronous replica is not responding and then ignore it.  If it becomes
> responsive again, you notify the administrator again and add it back as
> a sychronous replica.
>
> > command in any form may pose security risks. I'm not sure at this
> > point how this new timeout is going to work alongside
> > wal_sender_timeout.
>
> We have archive_command, so I don't see a problem with another shell
> command.

Why do we need a new command to inform the admin/user about a sync
replication being ignored (from sync quorum) for not responding or
acknowledging for a certain amount of time in SyncRepWaitForLSN()?
Can't we just add an extra column or use existing sync_state in
pg_stat_replication()? We can either introduce a new state such as
temporary_async or just use the existing state 'potential' [1]. A
problem is that the server has to be monitored for this extra, new
state. If we do this, we don't need another command to report.

> > I'm thinking about the possible options that an admin has to get out
> > of this situation:
> > 1) Removing the standby from synchronous_standby_names.
>
> Yes, see above.  We might need a read-only GUC that reports which
> sychronous replicas are active.  As you can see, there is a lot of API
> design required here, but this is the most effective approach.

If we use the above approach to report via pg_stat_replication(), we
don't need this.

> > > Once we have that, we can consider removing the cancel ability while
> > > waiting for synchronous replicas (since we have the timeout) or make it
> > > optional.  We can also consider how do notify the administrator during
> > > query cancel (if we allow it), backend abrupt exit/crash, and
> >
> > Yeah. If we have the
> > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution,
> > the users can then choose to disable processing query cancels/proc
> > dies while waiting for sync replication in SyncRepWaitForLSN().
>
> Yes.  We might also change things so a query cancel that happens during
> sychronous replica waiting can only be done by an administrator, not the
> session owner.  Again, lots of design needed here.

Yes, we need infrastructure to track who issued the query cancel or
proc die and so on. IMO, it's not a good way to allow/disallow query
cancels or CTRL+C based on role types - superusers or users with
replication roles or users who are members of any of predefined roles.

In general, it is the walsender serving sync standby that has to mark
itself as async standby by removing itself from
synchronous_standby_names, reloading config variables and waking up
the backends that are waiting in syncrep wait queue for it to update
LSN.

And, the new auto removal timeout should always be set to less than
wal_sender_timeout.

All that said, imagine we have
timeout-and-auto-removal-of-standby-from-sync-standbys-list solution
in one or the other forms with auto removal timeout set to 5 minutes,
any of following can happen:

1) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(),
no query cancel or proc die interrupt is arrived, the sync standby is
made as async standy after the timeout i.e. 5 minutes.
2) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(),
say for about 3 minutes, then query cancel or proc die interrupt is
arrived, should we immediately process it or wait for timeout to
happen (2 more minutes) and then process the interrupt? If we
immediately process the interrupts, then the
locally-committed-but-not-replicated-to-sync-standby problems
described upthread [2] are left unresolved.

[1] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-REPLICATION-VIEW
sync_state text
Synchronous state of this standby server. Possible values are:
async: This standby server is asynchronous.
potential: This standby server is now asynchronous, but can
potentially become synchronous if one of current synchronous ones
fails.
sync: This standby server is synchronous.
quorum: This standby server is considered as a candidate for quorum standbys.

[2] https://www.postgresql.org/message-id/CALj2ACXmMWtpmuT-%3Dv8F%2BLk4QCbdkeN%2ByHKXeRGKFfjG96YbKA%40ma...

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-10-06 15:42  Bruce Momjian <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 0 replies; 36+ messages in thread

From: Bruce Momjian @ 2022-10-06 15:42 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Thu, Oct  6, 2022 at 01:33:33PM +0530, Bharath Rupireddy wrote:
> On Thu, Oct 6, 2022 at 2:30 AM Bruce Momjian <[email protected]> wrote:
> >
> > As I highlighted above, by default you notify the administrator that a
> > sychronous replica is not responding and then ignore it.  If it becomes
> > responsive again, you notify the administrator again and add it back as
> > a sychronous replica.
> >
> > > command in any form may pose security risks. I'm not sure at this
> > > point how this new timeout is going to work alongside
> > > wal_sender_timeout.
> >
> > We have archive_command, so I don't see a problem with another shell
> > command.
> 
> Why do we need a new command to inform the admin/user about a sync
> replication being ignored (from sync quorum) for not responding or
> acknowledging for a certain amount of time in SyncRepWaitForLSN()?
> Can't we just add an extra column or use existing sync_state in
> pg_stat_replication()? We can either introduce a new state such as
> temporary_async or just use the existing state 'potential' [1]. A
> problem is that the server has to be monitored for this extra, new
> state. If we do this, we don't need another command to report.

Yes, that is a good point.  I assumed people would want notification
immediately rather than waiting for monitoring to notice it.  Consider
if you monitor every five seconds but the primary loses sync and goes
down during that five-second interval --- there would be no way to know
if sync stopped and reported committed transactions to the client before
the primary went down.  I would love to just rely on monitoring but I am
not sure that is sufficient for this use-case.

Of course, if email is being sent it might be still in the email queue
when the primary goes down, but I guess if I was doing it I would make
sure the email was delivered _before_ returning.  The point is that we
would not disable the sync and acknowledge the commit to the client
until the notification command returns success --- that kind of
guarantee is hard to do with monitoring.

These are good discussions to have --- maybe I am wrong.

> > > > Once we have that, we can consider removing the cancel ability while
> > > > waiting for synchronous replicas (since we have the timeout) or make it
> > > > optional.  We can also consider how do notify the administrator during
> > > > query cancel (if we allow it), backend abrupt exit/crash, and
> > >
> > > Yeah. If we have the
> > > timeout-and-auto-removal-of-standby-from-sync-standbys-list solution,
> > > the users can then choose to disable processing query cancels/proc
> > > dies while waiting for sync replication in SyncRepWaitForLSN().
> >
> > Yes.  We might also change things so a query cancel that happens during
> > sychronous replica waiting can only be done by an administrator, not the
> > session owner.  Again, lots of design needed here.
> 
> Yes, we need infrastructure to track who issued the query cancel or
> proc die and so on. IMO, it's not a good way to allow/disallow query
> cancels or CTRL+C based on role types - superusers or users with
> replication roles or users who are members of any of predefined roles.
> 
> In general, it is the walsender serving sync standby that has to mark
> itself as async standby by removing itself from
> synchronous_standby_names, reloading config variables and waking up
> the backends that are waiting in syncrep wait queue for it to update
> LSN.
> 
> And, the new auto removal timeout should always be set to less than
> wal_sender_timeout.
> 
> All that said, imagine we have
> timeout-and-auto-removal-of-standby-from-sync-standbys-list solution
> in one or the other forms with auto removal timeout set to 5 minutes,
> any of following can happen:
> 
> 1) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(),
> no query cancel or proc die interrupt is arrived, the sync standby is
> made as async standy after the timeout i.e. 5 minutes.
> 2) query is stuck waiting for sync standby ack in SyncRepWaitForLSN(),
> say for about 3 minutes, then query cancel or proc die interrupt is
> arrived, should we immediately process it or wait for timeout to
> happen (2 more minutes) and then process the interrupt? If we
> immediately process the interrupts, then the
> locally-committed-but-not-replicated-to-sync-standby problems
> described upthread [2] are left unresolved.

I have a feeling once we have the timeout, we would disable query cancel
when we are in this stage since it is canceling a committed query.  The
timeout would cancel the sync but at least the administrator would know.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson






^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-09 05:06  Andrey Borodin <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Andrey Borodin @ 2022-11-09 05:06 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian <[email protected]> wrote:
>
> So, what happens when an insufficient number of synchronous replicas
> reply?

It's a failover.

> Sessions hang because the synchronous behavior cannot be
> guaranteed.  We then _allow_ query cancel so the user or administrator
> can get out of the hung sessions and perhaps modify
> synchronous_standby_names.

Administrators should not modify synchronous_standby_names.
Administrator must shoot this not in the head.

> I have always felt this has to be done at the server level, meaning when
> a synchronous_standby_names replica is not responding after a certain
> timeout, the administrator must be notified by calling a shell command
> defined in a GUC and all sessions will ignore the replica.

Standbys are expelled from the waitlist according to quorum rules. I'd
propose not to invent more quorum rules involving shell scripts.
The Administrator expressed what number of standbys can be offline by
setting synchronous_standby_names. They actively asked for hanging
queries in case of insufficient standbys.

We have reserved administrator connections for the case when all
connection slots are used by hanging queries.

Best regards, Andrey Borodin.

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-27 19:26  Andrey Borodin <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 2 replies; 36+ messages in thread

From: Andrey Borodin @ 2022-11-27 19:26 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Tue, Nov 8, 2022 at 9:06 PM Andrey Borodin <[email protected]> wrote:
>
> On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian <[email protected]> wrote:
> >
> > So, what happens when an insufficient number of synchronous replicas
> > reply?
>
> It's a failover.
>
> > Sessions hang because the synchronous behavior cannot be
> > guaranteed.  We then _allow_ query cancel so the user or administrator
> > can get out of the hung sessions and perhaps modify
> > synchronous_standby_names.
>
> Administrators should not modify synchronous_standby_names.
> Administrator must shoot this not in the head.
>

Some funny stuff. If a user tries to cancel a non-replicated transaction
Azure Postgres will answer: "user requested cancel while waiting for
synchronous replication ack. The COMMIT record has already flushed to
WAL locally and might not have been replicatead to the standby. We
must wait here."
AWS RDS will answer: "ignoring request to cancel wait for synchronous
replication"
Yandex Managed Postgres will answer: "canceling wait for synchronous
replication due requested, but cancelation is not allowed. The
transaction has already committed locally and might not have been
replicated to the standby. We must wait here."

So, for many services providing Postgres as a service it's only a
matter of wording.

Best regards, Andrey Borodin.

^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-28 06:33  Bharath Rupireddy <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 2 replies; 36+ messages in thread

From: Bharath Rupireddy @ 2022-11-28 06:33 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin <[email protected]> wrote:
>
> Some funny stuff. If a user tries to cancel a non-replicated transaction
> Azure Postgres will answer: "user requested cancel while waiting for
> synchronous replication ack. The COMMIT record has already flushed to
> WAL locally and might not have been replicatead to the standby. We
> must wait here."
> AWS RDS will answer: "ignoring request to cancel wait for synchronous
> replication"
> Yandex Managed Postgres will answer: "canceling wait for synchronous
> replication due requested, but cancelation is not allowed. The
> transaction has already committed locally and might not have been
> replicated to the standby. We must wait here."
>
> So, for many services providing Postgres as a service it's only a
> matter of wording.

Thanks for verifying the behaviour. And many thanks for an off-list chat.

FWIW, I'm planning to prepare a patch as per the below idea which is
something similar to the initial proposal in this thread. Meanwhile,
thoughts are welcome.

1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
sync replication acknowledgement.
2. Process proc die immediately when a backend is waiting for sync
replication acknowledgement, as it does today, however, upon restart,
don't open up for business (don't accept ready-only connections)
unless the sync standbys have caught up.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-28 20:55  Bruce Momjian <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 0 replies; 36+ messages in thread

From: Bruce Momjian @ 2022-11-28 20:55 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Sun, Nov 27, 2022 at 11:26:50AM -0800, Andrey Borodin wrote:
> Some funny stuff. If a user tries to cancel a non-replicated transaction
> Azure Postgres will answer: "user requested cancel while waiting for
> synchronous replication ack. The COMMIT record has already flushed to
> WAL locally and might not have been replicatead to the standby. We
> must wait here."
> AWS RDS will answer: "ignoring request to cancel wait for synchronous
> replication"
> Yandex Managed Postgres will answer: "canceling wait for synchronous
> replication due requested, but cancelation is not allowed. The
> transaction has already committed locally and might not have been
> replicated to the standby. We must wait here."
> 
> So, for many services providing Postgres as a service it's only a
> matter of wording.

Wow, you are telling me all three cloud vendors changed how query cancel
behaves on an unresponsive synchronous replica?  That is certainly a
testament that the community needs to change or at least review our
behavior.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-28 20:59  Bruce Momjian <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Bruce Momjian @ 2022-11-28 20:59 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, Nov 28, 2022 at 12:03:06PM +0530, Bharath Rupireddy wrote:
> Thanks for verifying the behaviour. And many thanks for an off-list chat.
> 
> FWIW, I'm planning to prepare a patch as per the below idea which is
> something similar to the initial proposal in this thread. Meanwhile,
> thoughts are welcome.
> 
> 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
> sync replication acknowledgement.
> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.

You can prepare a patch, but it unlikely to get much interest until you
get agreement on what the behavior should be.  The optimal order of
developer actions is:

	Desirability -> Design -> Implement -> Test -> Review -> Commit
	https://wiki.postgresql.org/wiki/Todo#Development_Process

Telling us what other cloud vendors do is not sufficient.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-28 21:31  Andrey Borodin <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: Andrey Borodin @ 2022-11-28 21:31 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian <[email protected]> wrote:
>
> You can prepare a patch, but it unlikely to get much interest until you
> get agreement on what the behavior should be.

We discussed the approach on 2020's Unconference [0]. And there kind
of was an agreement.
Then I made a presentation on FOSDEM with all the details [1].
The patch had been on commitfest since 2019 [2]. There were reviewers
in the CF entry, and we kind of had an agreement.
Jeff Davis proposed a similar patch [3]. And we certainly agreed about cancels.
And now Bharath is proposing the same.

We have the interest and agreement.


Best regards, Andrey Borodin.

[0] https://wiki.postgresql.org/wiki/PgCon_2020_Developer_Unconference/Edge_cases_of_synchronous_replica...
[1] https://archive.fosdem.org/2021/schedule/event/postgresql_caveats_of_replication/attachments/slides/...
[2] https://commitfest.postgresql.org/26/2402/
[3] https://www.postgresql.org/message-id/flat/6a052e81060824a8286148b1165bafedbd7c86cd.camel%40j-davis....





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-28 21:53  Bruce Momjian <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 0 replies; 36+ messages in thread

From: Bruce Momjian @ 2022-11-28 21:53 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>; SATYANARAYANA NARLAPURAM <[email protected]>

On Mon, Nov 28, 2022 at 01:31:39PM -0800, Andrey Borodin wrote:
> On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian <[email protected]> wrote:
> >
> > You can prepare a patch, but it unlikely to get much interest until you
> > get agreement on what the behavior should be.
> 
> We discussed the approach on 2020's Unconference [0]. And there kind
> of was an agreement.
> Then I made a presentation on FOSDEM with all the details [1].
> The patch had been on commitfest since 2019 [2]. There were reviewers
> in the CF entry, and we kind of had an agreement.
> Jeff Davis proposed a similar patch [3]. And we certainly agreed about cancels.
> And now Bharath is proposing the same.
> 
> We have the interest and agreement.

Okay, I was not aware we had such broad agreement.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 16:14  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2022-11-29 16:14 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Bruce Momjian <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sun, Nov 27, 2022 at 10:33 PM Bharath Rupireddy <
[email protected]> wrote:

> On Mon, Nov 28, 2022 at 12:57 AM Andrey Borodin <[email protected]>
> wrote:
> >
> > Some funny stuff. If a user tries to cancel a non-replicated transaction
> > Azure Postgres will answer: "user requested cancel while waiting for
> > synchronous replication ack. The COMMIT record has already flushed to
> > WAL locally and might not have been replicatead to the standby. We
> > must wait here."
> > AWS RDS will answer: "ignoring request to cancel wait for synchronous
> > replication"
> > Yandex Managed Postgres will answer: "canceling wait for synchronous
> > replication due requested, but cancelation is not allowed. The
> > transaction has already committed locally and might not have been
> > replicated to the standby. We must wait here."
> >
> > So, for many services providing Postgres as a service it's only a
> > matter of wording.
>
> Thanks for verifying the behaviour. And many thanks for an off-list chat.
>
> FWIW, I'm planning to prepare a patch as per the below idea which is
> something similar to the initial proposal in this thread. Meanwhile,
> thoughts are welcome.
>
> 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
> sync replication acknowledgement.
>

+1


> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.
>

Are you planning to block connections or queries to the database? It would
be good to allow connections and let them query the monitoring views but
block the queries until sync standby have caught up. Otherwise, this leaves
a monitoring hole. In cloud, I presume superusers are allowed to connect
and monitor (end customers are not the role members and can't query the
data). The same can't be true for all the installations. Could you please
add more details on your approach?


>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 16:29  Bruce Momjian <[email protected]>
  parent: SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 2 replies; 36+ messages in thread

From: Bruce Momjian @ 2022-11-29 16:29 UTC (permalink / raw)
  To: SATYANARAYANA NARLAPURAM <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Andrey Borodin <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
>     2. Process proc die immediately when a backend is waiting for sync
>     replication acknowledgement, as it does today, however, upon restart,
>     don't open up for business (don't accept ready-only connections)
>     unless the sync standbys have caught up.
> 
> 
> Are you planning to block connections or queries to the database? It would be
> good to allow connections and let them query the monitoring views but block the
> queries until sync standby have caught up. Otherwise, this leaves a monitoring
> hole. In cloud, I presume superusers are allowed to connect and monitor (end
> customers are not the role members and can't query the data). The same can't be
> true for all the installations. Could you please add more details on your
> approach?

I think ALTER SYSTEM should be allowed, particularly so you can modify
synchronous_standby_names, no?

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 16:42  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2022-11-29 16:42 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <[email protected]> wrote:

> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> >     2. Process proc die immediately when a backend is waiting for sync
> >     replication acknowledgement, as it does today, however, upon restart,
> >     don't open up for business (don't accept ready-only connections)
> >     unless the sync standbys have caught up.
> >
> >
> > Are you planning to block connections or queries to the database? It
> would be
> > good to allow connections and let them query the monitoring views but
> block the
> > queries until sync standby have caught up. Otherwise, this leaves a
> monitoring
> > hole. In cloud, I presume superusers are allowed to connect and monitor
> (end
> > customers are not the role members and can't query the data). The same
> can't be
> > true for all the installations. Could you please add more details on your
> > approach?
>
> I think ALTER SYSTEM should be allowed, particularly so you can modify
> synchronous_standby_names, no?


Yes, Change in synchronous_standby_names is expected in this situation.
IMHO, blocking all the connections is not a recommended approach.


>
> --
>   Bruce Momjian  <[email protected]>        https://momjian.us
>   EDB                                      https://enterprisedb.com
>
> Embrace your flaws.  They make you human, rather than perfect,
> which you will never be.
>


^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 17:15  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 0 replies; 36+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2022-11-29 17:15 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM <
[email protected]> wrote:

>
>
> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <[email protected]> wrote:
>
>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
>> >     2. Process proc die immediately when a backend is waiting for sync
>> >     replication acknowledgement, as it does today, however, upon
>> restart,
>> >     don't open up for business (don't accept ready-only connections)
>> >     unless the sync standbys have caught up.
>> >
>> >
>> > Are you planning to block connections or queries to the database? It
>> would be
>> > good to allow connections and let them query the monitoring views but
>> block the
>> > queries until sync standby have caught up. Otherwise, this leaves a
>> monitoring
>> > hole. In cloud, I presume superusers are allowed to connect and monitor
>> (end
>> > customers are not the role members and can't query the data). The same
>> can't be
>> > true for all the installations. Could you please add more details on
>> your
>> > approach?
>>
>> I think ALTER SYSTEM should be allowed, particularly so you can modify
>> synchronous_standby_names, no?
>
>
> Yes, Change in synchronous_standby_names is expected in this situation.
> IMHO, blocking all the connections is not a recommended approach.
>

How about allowing superusers (they can still read locally committed data)
and users part of pg_monitor role?


>
>>
>> --
>>   Bruce Momjian  <[email protected]>        https://momjian.us
>>   EDB                                      https://enterprisedb.com
>>
>> Embrace your flaws.  They make you human, rather than perfect,
>> which you will never be.
>>
>


^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 18:52  Andrey Borodin <[email protected]>
  parent: Bruce Momjian <[email protected]>
  1 sibling, 1 reply; 36+ messages in thread

From: Andrey Borodin @ 2022-11-29 18:52 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: SATYANARAYANA NARLAPURAM <[email protected]>; Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <[email protected]> wrote:
>
> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> >     2. Process proc die immediately when a backend is waiting for sync
> >     replication acknowledgement, as it does today, however, upon restart,
> >     don't open up for business (don't accept ready-only connections)
> >     unless the sync standbys have caught up.
> >
> >
> > Are you planning to block connections or queries to the database? It would be
> > good to allow connections and let them query the monitoring views but block the
> > queries until sync standby have caught up. Otherwise, this leaves a monitoring
> > hole. In cloud, I presume superusers are allowed to connect and monitor (end
> > customers are not the role members and can't query the data). The same can't be
> > true for all the installations. Could you please add more details on your
> > approach?
>
> I think ALTER SYSTEM should be allowed, particularly so you can modify
> synchronous_standby_names, no?

We don't allow SQL access during crash recovery until it's caught up
to consistency point. And that's for a reason - the cluster may have
invalid system catalog.
So no, after crash without a quorum of standbys you can only change
auto.conf and send SIGHUP. Accessing the system catalog during crash
recovery is another unrelated problem.

But I'd propose to treat these two points differently, they possess
drastically different scales of danger. Query Cancels are issued here
and there during failovers\switchovers. Crash amidst network
partitioning is not that common.

Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 19:20  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 36+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2022-11-29 19:20 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <[email protected]>
wrote:

> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <[email protected]> wrote:
> >
> > On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
> > >     2. Process proc die immediately when a backend is waiting for sync
> > >     replication acknowledgement, as it does today, however, upon
> restart,
> > >     don't open up for business (don't accept ready-only connections)
> > >     unless the sync standbys have caught up.
> > >
> > >
> > > Are you planning to block connections or queries to the database? It
> would be
> > > good to allow connections and let them query the monitoring views but
> block the
> > > queries until sync standby have caught up. Otherwise, this leaves a
> monitoring
> > > hole. In cloud, I presume superusers are allowed to connect and
> monitor (end
> > > customers are not the role members and can't query the data). The same
> can't be
> > > true for all the installations. Could you please add more details on
> your
> > > approach?
> >
> > I think ALTER SYSTEM should be allowed, particularly so you can modify
> > synchronous_standby_names, no?
>
> We don't allow SQL access during crash recovery until it's caught up
> to consistency point. And that's for a reason - the cluster may have
> invalid system catalog.
> So no, after crash without a quorum of standbys you can only change
> auto.conf and send SIGHUP. Accessing the system catalog during crash
> recovery is another unrelated problem.
>

In the crash recovery case, catalog is inconsistent but in this case, the
cluster has remote uncommitted changes (consistent). Accepting a superuser
connection is no harm. The auth checks performed are still valid after
standbys fully caught up. I don't see a reason why superuser / pg_monitor
connections are required to be blocked.


> But I'd propose to treat these two points differently, they possess
> drastically different scales of danger. Query Cancels are issued here
> and there during failovers\switchovers. Crash amidst network
> partitioning is not that common.
>

Supportability and operability are more important in corner cases to
quickly troubleshoot an issue,


>
> Best regards, Andrey Borodin.
>


^ permalink  raw  reply  [nested|flat] 36+ messages in thread

* Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
@ 2022-11-29 19:37  SATYANARAYANA NARLAPURAM <[email protected]>
  parent: SATYANARAYANA NARLAPURAM <[email protected]>
  0 siblings, 0 replies; 36+ messages in thread

From: SATYANARAYANA NARLAPURAM @ 2022-11-29 19:37 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Bharath Rupireddy <[email protected]>; Kyotaro Horiguchi <[email protected]>; Laurenz Albe <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Nov 29, 2022 at 11:20 AM SATYANARAYANA NARLAPURAM <
[email protected]> wrote:

>
>
> On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <[email protected]>
> wrote:
>
>> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <[email protected]> wrote:
>> >
>> > On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM
>> wrote:
>> > >     2. Process proc die immediately when a backend is waiting for sync
>> > >     replication acknowledgement, as it does today, however, upon
>> restart,
>> > >     don't open up for business (don't accept ready-only connections)
>> > >     unless the sync standbys have caught up.
>> > >
>> > >
>> > > Are you planning to block connections or queries to the database? It
>> would be
>> > > good to allow connections and let them query the monitoring views but
>> block the
>> > > queries until sync standby have caught up. Otherwise, this leaves a
>> monitoring
>> > > hole. In cloud, I presume superusers are allowed to connect and
>> monitor (end
>> > > customers are not the role members and can't query the data). The
>> same can't be
>> > > true for all the installations. Could you please add more details on
>> your
>> > > approach?
>> >
>> > I think ALTER SYSTEM should be allowed, particularly so you can modify
>> > synchronous_standby_names, no?
>>
>> We don't allow SQL access during crash recovery until it's caught up
>> to consistency point. And that's for a reason - the cluster may have
>> invalid system catalog.
>> So no, after crash without a quorum of standbys you can only change
>> auto.conf and send SIGHUP. Accessing the system catalog during crash
>> recovery is another unrelated problem.
>>
>
> In the crash recovery case, catalog is inconsistent but in this case, the
> cluster has remote uncommitted changes (consistent). Accepting a superuser
> connection is no harm. The auth checks performed are still valid after
> standbys fully caught up. I don't see a reason why superuser / pg_monitor
> connections are required to be blocked.
>

If blocking queries is harder, and superuser is not allowed to connect as
it can read remote uncommitted data,  how about adding a new role that  can
update and reload the server configuration?

>
>
>> But I'd propose to treat these two points differently, they possess
>> drastically different scales of danger. Query Cancels are issued here
>> and there during failovers\switchovers. Crash amidst network
>> partitioning is not that common.
>>
>
> Supportability and operability are more important in corner cases to
> quickly troubleshoot an issue,
>
>
>>
>> Best regards, Andrey Borodin.
>>
>


^ permalink  raw  reply  [nested|flat] 36+ messages in thread

end of thread, other threads:[~2022-11-29 19:37 UTC | newest]

Thread overview: 36+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 06:26 Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication Laurenz Albe <[email protected]>
2022-05-09 09:20 ` Bharath Rupireddy <[email protected]>
2022-05-09 09:44   ` Dilip Kumar <[email protected]>
2022-05-09 11:09   ` Andrey Borodin <[email protected]>
2022-05-10 07:48     ` Dilip Kumar <[email protected]>
2022-05-10 07:59       ` Bharath Rupireddy <[email protected]>
2022-05-10 12:25         ` Andrey Borodin <[email protected]>
2022-05-13 12:09           ` Bharath Rupireddy <[email protected]>
2022-07-25 09:29           ` Bharath Rupireddy <[email protected]>
2022-07-25 10:50             ` Andrey Borodin <[email protected]>
2022-08-04 08:12               ` Bharath Rupireddy <[email protected]>
2022-08-04 08:31                 ` Bharath Rupireddy <[email protected]>
2022-08-05 02:49 ` Kyotaro Horiguchi <[email protected]>
2022-08-08 13:43   ` Bharath Rupireddy <[email protected]>
2022-08-09 07:12     ` Kyotaro Horiguchi <[email protected]>
2022-08-09 08:46       ` Bharath Rupireddy <[email protected]>
2022-09-27 13:22         ` Bharath Rupireddy <[email protected]>
2022-09-29 22:53           ` Bruce Momjian <[email protected]>
2022-10-01 01:29             ` Bharath Rupireddy <[email protected]>
2022-10-05 21:00               ` Bruce Momjian <[email protected]>
2022-10-06 08:03                 ` Bharath Rupireddy <[email protected]>
2022-10-06 15:42                   ` Bruce Momjian <[email protected]>
2022-11-09 05:06             ` Andrey Borodin <[email protected]>
2022-11-27 19:26               ` Andrey Borodin <[email protected]>
2022-11-28 06:33                 ` Bharath Rupireddy <[email protected]>
2022-11-28 20:59                   ` Bruce Momjian <[email protected]>
2022-11-28 21:31                     ` Andrey Borodin <[email protected]>
2022-11-28 21:53                       ` Bruce Momjian <[email protected]>
2022-11-29 16:14                   ` SATYANARAYANA NARLAPURAM <[email protected]>
2022-11-29 16:29                     ` Bruce Momjian <[email protected]>
2022-11-29 16:42                       ` SATYANARAYANA NARLAPURAM <[email protected]>
2022-11-29 17:15                         ` SATYANARAYANA NARLAPURAM <[email protected]>
2022-11-29 18:52                       ` Andrey Borodin <[email protected]>
2022-11-29 19:20                         ` SATYANARAYANA NARLAPURAM <[email protected]>
2022-11-29 19:37                           ` SATYANARAYANA NARLAPURAM <[email protected]>
2022-11-28 20:55                 ` Bruce Momjian <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox