public inbox for [email protected]  
help / color / mirror / Atom feed
From: Xuneng Zhou <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Bertrand Drouvot <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: Fri, 12 Jun 2026 08:57:05 +0800
Message-ID: <CABPTF7WSpNOYu84fjGH2t56BctRzVD7t8WqhgvML2DRh8Vtfog@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <aiP/X1FThlZRCFiN@bdtpg>
	<OS9PR01MB1214908BA67A7811BD6281208F51C2@OS9PR01MB12149.jpnprd01.prod.outlook.com>
	<CABPTF7WmLKOJcSPod7zT2hynUFQcUs0VoQyR-p=XFsafvmGm7g@mail.gmail.com>
	<CABPTF7U_B9pdC563fFONQLX_FXCZtxZgM+jN9snzMVg9b9MRfg@mail.gmail.com>
	<aibQB2f9h3nYeg/2@bdtpg>
	<CABPTF7XGDgnB97fBUQb+Tm=5NnAzS1iKTWcJTePpGinL1ZAxWQ@mail.gmail.com>
	<CABPTF7VovNWnH=dG_A=e6xkMYt11ubbj935Ymfky55cXcWcrrA@mail.gmail.com>
	<aifyiJG3uiJzCYTB@bdtpg>
	<CABPTF7XAb7ExE-7qSsKgSw3K0hfsNPzXuupQ1aJ8zXOgZ4tPNw@mail.gmail.com>
	<aimeoNYE93VLiQHt@bdtpg>
	<[email protected]>

Hi Michael,

On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <[email protected]> wrote:
>
> On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> > On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
> >> The
> >> essential thing is just to ensure that the startup remains paused
> >> until decoding output is observed.
> >
> > Right, thanks for confirming. That's exactly what v2 is doing.
>
> I have looked at this thread, and my first impression was that this
> could be a data integrity issue while decoding changes due to the
> transient errors one could see across the promotion requests.
>
> But it's less severe than I thought initially: we have an availability
> problem here, down to v16, with a correct recovery possible once the
> promotion request has completed.  That could be indeed surprising for
> users that have HA setups with standbys doing logical decoding..  The
> SQL function path is less worrying to me, there are as far as I know
> few users of it compared to the "native" path with sync workers.
>
> read_local_xlog_page_guts() does not only impact SQL-callable logirep
> functions, even it is the spot that should be hit most of the time
> (again, the RecoveryInProgress() vs promotion window is super narrow).
> At quick glance, things are:
> - walinspect.
> - Slot advance.
> - Slot creation (?), but it feels even narrower.

Yeah, it is used for two-phase commit as well. The usage of it is
broader than I observed before. Repack worker also make use of it.

> With two items dealt with on this thread for these two callback paths
> changed, moving on the part related to physical replication into its
> own thread would be better.  This requires an entirely different
> analysis and a different lookup.

+1

> The backpatch of PG16 is straight-forward and adding
> GetWALInsertionTimeLineIfSet() down there does not look like an issue.
> Not having any tests in v16 feels sad, but that's life.  It does not
> prevent addressing the availability issue on this branch.
>
> I'll go take it up from here.
> --

Thanks for dealing with this!

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






view thread (24+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
  In-Reply-To: <CABPTF7WSpNOYu84fjGH2t56BctRzVD7t8WqhgvML2DRh8Vtfog@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox