public inbox for [email protected]
help / color / mirror / Atom feedFrom: Michael Paquier <[email protected]>
To: Bertrand Drouvot <[email protected]>
Cc: Xuneng Zhou <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: Thu, 11 Jun 2026 10:15:01 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <aimeoNYE93VLiQHt@bdtpg>
References: <aiP/X1FThlZRCFiN@bdtpg>
<OS9PR01MB1214908BA67A7811BD6281208F51C2@OS9PR01MB12149.jpnprd01.prod.outlook.com>
<CABPTF7WmLKOJcSPod7zT2hynUFQcUs0VoQyR-p=XFsafvmGm7g@mail.gmail.com>
<CABPTF7U_B9pdC563fFONQLX_FXCZtxZgM+jN9snzMVg9b9MRfg@mail.gmail.com>
<aibQB2f9h3nYeg/2@bdtpg>
<CABPTF7XGDgnB97fBUQb+Tm=5NnAzS1iKTWcJTePpGinL1ZAxWQ@mail.gmail.com>
<CABPTF7VovNWnH=dG_A=e6xkMYt11ubbj935Ymfky55cXcWcrrA@mail.gmail.com>
<aifyiJG3uiJzCYTB@bdtpg>
<CABPTF7XAb7ExE-7qSsKgSw3K0hfsNPzXuupQ1aJ8zXOgZ4tPNw@mail.gmail.com>
<aimeoNYE93VLiQHt@bdtpg>
On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
>> The
>> essential thing is just to ensure that the startup remains paused
>> until decoding output is observed.
>
> Right, thanks for confirming. That's exactly what v2 is doing.
I have looked at this thread, and my first impression was that this
could be a data integrity issue while decoding changes due to the
transient errors one could see across the promotion requests.
But it's less severe than I thought initially: we have an availability
problem here, down to v16, with a correct recovery possible once the
promotion request has completed. That could be indeed surprising for
users that have HA setups with standbys doing logical decoding.. The
SQL function path is less worrying to me, there are as far as I know
few users of it compared to the "native" path with sync workers.
read_local_xlog_page_guts() does not only impact SQL-callable logirep
functions, even it is the spot that should be hit most of the time
(again, the RecoveryInProgress() vs promotion window is super narrow).
At quick glance, things are:
- walinspect.
- Slot advance.
- Slot creation (?), but it feels even narrower.
With two items dealt with on this thread for these two callback paths
changed, moving on the part related to physical replication into its
own thread would be better. This requires an entirely different
analysis and a different lookup.
The backpatch of PG16 is straight-forward and adding
GetWALInsertionTimeLineIfSet() down there does not look like an issue.
Not having any tests in v16 feels sad, but that's life. It does not
prevent addressing the availability issue on this branch.
I'll go take it up from here.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
view thread (24+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox