Date: Mon, 25 Apr 2022 09:48:13 -0700
From: Nathan Bossart <nathandbossart@gmail.com>
To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
	SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
Subject: Re: An attempt to avoid
 locally-committed-but-not-replicated-to-standby-transactions in synchronous
 replication
Message-ID: <20220425164813.GA2890928@nathanxps13>
References: <CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CALj2ACUrOB59QaE6=jF2cFAyv1MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com>
Archived-At: <https://www.postgresql.org/message-id/20220425164813.GA2890928%40nathanxps13>
Precedence: bulk

On Mon, Apr 25, 2022 at 07:51:03PM +0530, Bharath Rupireddy wrote:
> With synchronous replication typically all the transactions (txns)
> first locally get committed, then streamed to the sync standbys and
> the backend that generated the transaction will wait for ack from sync
> standbys. While waiting for ack, it may happen that the query or the
> txn gets canceled (QueryCancelPending is true) or the waiting backend
> is asked to exit (ProcDiePending is true). In either of these cases,
> the wait for ack gets canceled and leaves the txn in an inconsistent
> state (as in the client thinks that the txn would have replicated to
> sync standbys) - "The transaction has already committed locally, but
> might not have been replicated to the standby.". Upon restart after
> the crash or in the next txn after the old locally committed txn was
> canceled, the client will be able to see the txns that weren't
> actually streamed to sync standbys. Also, if the client fails over to
> one of the sync standbys after the crash (either by choice or because
> of automatic failover management after crash), the locally committed
> txns on the crashed primary would be lost which isn't good in a true
> HA solution.

This topic has come up a few times recently [0] [1] [2].

> Thoughts?

І think this will require a fair amount of discussion.  I'm personally in
favor of just adding a GUC that can be enabled to block canceling
synchronous replication waits, but I know folks have concerns with that
approach.  When I looked at this stuff previously [2], it seemed possible
to handle the other data loss scenarios (e.g., forcing failover whenever
the primary shut down, turning off restart_after_crash).  However, I'm not
wedded to this approach.

[0] https://postgr.es/m/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru
[1] https://postgr.es/m/cac4b9df-92c6-77aa-687b-18b86cb13728%40stratox.cz
[2] https://postgr.es/m/FDE157D7-3F35-450D-B927-7EC2F82DB1D6%40amazon.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com