public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Bertrand Drouvot <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: Fri, 12 Jun 2026 08:57:05 +0800
Message-ID: <CABPTF7WSpNOYu84fjGH2t56BctRzVD7t8WqhgvML2DRh8Vtfog@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <aiP/X1FThlZRCFiN@bdtpg>
<OS9PR01MB1214908BA67A7811BD6281208F51C2@OS9PR01MB12149.jpnprd01.prod.outlook.com>
<CABPTF7WmLKOJcSPod7zT2hynUFQcUs0VoQyR-p=XFsafvmGm7g@mail.gmail.com>
<CABPTF7U_B9pdC563fFONQLX_FXCZtxZgM+jN9snzMVg9b9MRfg@mail.gmail.com>
<aibQB2f9h3nYeg/2@bdtpg>
<CABPTF7XGDgnB97fBUQb+Tm=5NnAzS1iKTWcJTePpGinL1ZAxWQ@mail.gmail.com>
<CABPTF7VovNWnH=dG_A=e6xkMYt11ubbj935Ymfky55cXcWcrrA@mail.gmail.com>
<aifyiJG3uiJzCYTB@bdtpg>
<CABPTF7XAb7ExE-7qSsKgSw3K0hfsNPzXuupQ1aJ8zXOgZ4tPNw@mail.gmail.com>
<aimeoNYE93VLiQHt@bdtpg>
<[email protected]>
Hi Michael,
On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <[email protected]> wrote:
>
> On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> > On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
> >> The
> >> essential thing is just to ensure that the startup remains paused
> >> until decoding output is observed.
> >
> > Right, thanks for confirming. That's exactly what v2 is doing.
>
> I have looked at this thread, and my first impression was that this
> could be a data integrity issue while decoding changes due to the
> transient errors one could see across the promotion requests.
>
> But it's less severe than I thought initially: we have an availability
> problem here, down to v16, with a correct recovery possible once the
> promotion request has completed. That could be indeed surprising for
> users that have HA setups with standbys doing logical decoding.. The
> SQL function path is less worrying to me, there are as far as I know
> few users of it compared to the "native" path with sync workers.
>
> read_local_xlog_page_guts() does not only impact SQL-callable logirep
> functions, even it is the spot that should be hit most of the time
> (again, the RecoveryInProgress() vs promotion window is super narrow).
> At quick glance, things are:
> - walinspect.
> - Slot advance.
> - Slot creation (?), but it feels even narrower.
Yeah, it is used for two-phase commit as well. The usage of it is
broader than I observed before. Repack worker also make use of it.
> With two items dealt with on this thread for these two callback paths
> changed, moving on the part related to physical replication into its
> own thread would be better. This requires an entirely different
> analysis and a different lookup.
+1
> The backpatch of PG16 is straight-forward and adding
> GetWALInsertionTimeLineIfSet() down there does not look like an issue.
> Not having any tests in v16 feels sad, but that's life. It does not
> prevent addressing the availability issue on this branch.
>
> I'll go take it up from here.
> --
Thanks for dealing with this!
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
view thread (24+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
In-Reply-To: <CABPTF7WSpNOYu84fjGH2t56BctRzVD7t8WqhgvML2DRh8Vtfog@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox