MIME-Version: 1.0
References: <20220805.114916.994654810780821553.horikyota.ntt@gmail.com>
 <CALj2ACWPMYoPSC3t-9uW+0gqDUcJf1mLww6hHzo2V2AvE-Tu+w@mail.gmail.com>
 <20220809.161236.1486509314201074910.horikyota.ntt@gmail.com>
 <CALj2ACXmMWtpmuT-=v8F+Lk4QCbdkeN+yHKXeRGKFfjG96YbKA@mail.gmail.com>
 <CALj2ACUO6oz-43ryqfMOVZ_Q-N10C5tkzKku12+QV02NnXsDrw@mail.gmail.com>
 <YzYh3NpCQAFkA6lF@momjian.us> <CAAhFRxjFGSk-hVTjnpFwm1XBUcHL8Obugt=P+ixV5AD9H+Kkrw@mail.gmail.com>
 <CAAhFRxgcBy-UCvyJ1ZZ1UKf4Owrx4J2X1F4tN_FD=fh5wZgdkw@mail.gmail.com>
 <CALj2ACVG5KCoPD_5AF2_u07HuZe4ajaLWKycB6OBYsGuj67OhA@mail.gmail.com>
 <CAHg+QDf9sMJ-r9JqFQTALRy8dX8Mr6SoFEvXx8V-Tto10VcFPA@mail.gmail.com>
 <Y4YzWeRgDYOj5Rod@momjian.us> <CAHg+QDcw6er6maVHq5PGR2t7NePXMWD9Vih9TRE=8Z=LDxOE9w@mail.gmail.com>
 <CAHg+QDdz+7H8wuNk9A6nBFei46BWhBq1A37_O4kNmJpuSq+dUg@mail.gmail.com>
In-Reply-To: <CAHg+QDdz+7H8wuNk9A6nBFei46BWhBq1A37_O4kNmJpuSq+dUg@mail.gmail.com>
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 30 Jan 2023 11:25:23 +0530
Message-ID: <CALj2ACVW1b7ue2qskO-Mef6975Mf3QZJs+47sHAgk8QB-bmDMA@mail.gmail.com>
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions
 in synchronous replication
To: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>, Bruce Momjian <bruce@momjian.us>, 
	Andrey Borodin <amborodin86@gmail.com>
Cc: Kyotaro Horiguchi <horikyota.ntt@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>, 
	PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://www.postgresql.org/message-id/CALj2ACVW1b7ue2qskO-Mef6975Mf3QZJs%2B47sHAgk8QB-bmDMA%40mail.gmail.com>
Precedence: bulk

On Tue, Nov 29, 2022 at 10:45 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
>>
>> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:
>>>
>>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote:
>>> >     2. Process proc die immediately when a backend is waiting for sync
>>> >     replication acknowledgement, as it does today, however, upon restart,
>>> >     don't open up for business (don't accept ready-only connections)
>>> >     unless the sync standbys have caught up.
>>> >
>>> > Are you planning to block connections or queries to the database? It would be
>>> > good to allow connections and let them query the monitoring views but block the
>>> > queries until sync standby have caught up. Otherwise, this leaves a monitoring
>>> > hole. In cloud, I presume superusers are allowed to connect and monitor (end
>>> > customers are not the role members and can't query the data). The same can't be
>>> > true for all the installations. Could you please add more details on your
>>> > approach?
>>>
>>> I think ALTER SYSTEM should be allowed, particularly so you can modify
>>> synchronous_standby_names, no?
>>
>> Yes, Change in synchronous_standby_names is expected in this situation. IMHO, blocking all the connections is not a recommended approach.
>
> How about allowing superusers (they can still read locally committed data) and users part of pg_monitor role?

I started to spend time on this feature again. Thanks all for your
comments so far.

Per latest comments, it looks like we're mostly okay to emit a warning
and ignore query cancel interrupts while waiting for sync replication
ACK.

For proc die, it looks like the suggestion was to process it
immediately and upon next restart, don't allow user connections unless
all sync standbys were caught up. However, we need to be able to allow
replication connections from standbys so that they'll be able to
stream the needed WAL and catch up with primary, allow superuser or
users with pg_monitor role to connect to perform ALTER SYSTEM to
remove the unresponsive sync standbys if any from the list or disable
sync replication altogether or monitor for flush lsn/catch up status.
And block all other connections. Note that replication, superuser and
users with pg_monitor role connections are allowed only after the
server reaches a consistent state not before that to not read any
inconsistent data.

The trickiest part of doing the above is how we detect upon restart
that the server received proc die while waiting for sync replication
ACK. One idea might be to set a flag in the control file before the
crash. Second idea might be to write a marker file (although I don't
favor this idea); presence indicates that the server was waiting for
sync replication ACK before the crash. However, we may not detect all
sorts of crashes in a backend when it is waiting for sync replication
ACK to do any of these two ideas. Therefore, this may not be a
complete solution.

Third idea might be to just let the primary wait for sync standbys to
catch up upon restart irrespective of whether it was crashed or not
while waiting for sync replication ACK. While this idea works well
without having to detect all sorts of crashes, the primary may not
come up if any unresponsive standbys are present (currently, the
primary continues to be operational for read-only queries at least
irrespective of whether sync standbys have caught up or not).

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com