public inbox for [email protected]
help / color / mirror / Atom feedt/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
24+ messages / 5 participants
[nested] [flat]
* t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
@ 2026-06-06 09:00 Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Alexander Lakhin @ 2026-06-06 09:00 UTC (permalink / raw)
To: pgsql-hackers
Hello hackers,
I'd like to present a recent failure of 035_standby_logical_decoding on
buildfarm [1]:
### Promoting node "standby"
# Running: pg_ctl --pgdata
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata
--log
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log
promote
waiting for server to promote.... done
server promoted
Waiting for replication conn cascading_standby's replay_lsn to pass 0/0648A290 on standby
done
[10:40:16.996](0.446s) ok 84 - got expected output from SQL decoding session on promoted standby
[10:40:16.997](0.001s) # pump_until: process terminated unexpectedly when searching for "(?^s:^.*COMMIT.*COMMIT$)" with
stream: ""
[10:40:16.997](0.000s) not ok 85 - got 2 COMMIT from pg_recvlogical output
[10:40:16.997](0.000s) # Failed test 'got 2 COMMIT from pg_recvlogical output'
# at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1036.
[10:40:16.997](0.000s) not ok 86 - got same expected output from pg_recvlogical decoding session
[10:40:16.997](0.000s) # Failed test 'got same expected output from pg_recvlogical decoding session'
# at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1040.
[10:40:16.998](0.000s) # got: ''
# expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
# table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
# table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
# table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
# COMMIT
# BEGIN
# table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
# table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'
# table public.decoding_test: INSERT: x[integer]:7 y[text]:'7'
# COMMIT'
[10:40:17.009](0.011s) ok 87 - got expected output from SQL decoding session on cascading standby
[10:40:17.009](0.000s) ok 88 - got 2 COMMIT from pg_recvlogical output
035_standby_logical_decoding_standby.log contains:
2026-06-03 10:40:16.745 CEST [3955206][startup][29/0:0] LOG: received promote request
...
2026-06-03 10:40:16.784 CEST [3955206][startup][29/0:0] LOG: archive recovery complete
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: starting logical decoding for slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] DETAIL: Streaming transactions committing after 0/06487A20,
reading WAL from 0/064879E8.
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT: START_REPLICATION SLOT "promotion_activeslot"
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] ERROR: requested WAL segment 000000010000000000000006 has
already been removed
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT: START_REPLICATION SLOT "promotion_activeslot"
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: released logical replication slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][:0] LOG: disconnection: session time: 0:00:00.212 user=bf
database=testdb host=[local]
...
That is, walsender requested WAL segment for timeline 1, while in a
successful run, it reads WAL for timeline 2.
I've managed to reproduce this failure with:
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1502,2 +1502,3 @@ StartLogicalReplication(StartReplicationCmd *cmd)
ReplicationSlotAcquire(cmd->slotname, true, true);
+pg_usleep(200000);
Being executed in a loop, the test fails for me within 100 iterations
(within 10 on a slower VM):
ITERATION 23
# +++ tap check in src/test/recovery +++
t/035_standby_logical_decoding.pl .. 84/? # pump_until: process terminated unexpectedly when searching for
"(?^s:^.*COMMIT.*COMMIT$)" with stream: ""
# Failed test 'got 2 COMMIT from pg_recvlogical output'
# at t/035_standby_logical_decoding.pl line 1038.
# Failed test 'got same expected output from pg_recvlogical decoding session'
# at t/035_standby_logical_decoding.pl line 1042.
# got: ''
# expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
...
As far as I can see, the timeline is chosen in logical_read_xlog_page()
depending on the recovery state:
am_cascading_walsender = RecoveryInProgress();
if (am_cascading_walsender)
GetXLogReplayRecPtr(&currTLI);
else
currTLI = GetWALInsertionTimeLine();
With diagnostic logging added here, a failed run shows:
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: starting logical decoding for slot
"promotion_activeslot"
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl DETAIL: Streaming transactions committing
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: !!!logical_read_xlog_page|
am_cascading_walsender: 1, currTLI: 1
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000010000000000000006
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl ERROR: requested WAL segment
000000010000000000000006 has already been removed
In a successful run, am_cascading_walsender/RecoveryInProgress() == false and currTLI == 2:
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: starting logical decoding for slot
"promotion_activeslot"
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl DETAIL: Streaming transactions committing
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: !!!logical_read_xlog_page|
am_cascading_walsender: 0, currTLI: 2
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000020000000000000006
Reproduced at the commit fcd77d532, which added the test.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-06-03%2008%3A31%3A43
Best regards,
Alexander
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
@ 2026-06-06 11:07 ` Bertrand Drouvot <[email protected]>
2026-06-06 12:56 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
0 siblings, 2 replies; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-06 11:07 UTC (permalink / raw)
To: Alexander Lakhin <[email protected]>; +Cc: pgsql-hackers
Hi Alexander,
On Sat, Jun 06, 2026 at 12:00:00PM +0300, Alexander Lakhin wrote:
> Hello hackers,
>
> That is, walsender requested WAL segment for timeline 1, while in a
> successful run, it reads WAL for timeline 2.
>
> I've managed to reproduce this failure with:
Thanks for the report and the repro!
> As far as I can see, the timeline is chosen in logical_read_xlog_page()
> depending on the recovery state:
> am_cascading_walsender = RecoveryInProgress();
>
> if (am_cascading_walsender)
> GetXLogReplayRecPtr(&currTLI);
> else
> currTLI = GetWALInsertionTimeLine();
Yeah, it looks like there is a race condition here. I think we should check if
the insertion timeline has already been set (like the walsummarizer is doing).
I'll work on a fix early next week.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-06 12:56 ` Xuneng Zhou <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-06 12:56 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; pgsql-hackers
Hi Bertrand,
On Sat, Jun 6, 2026 at 7:07 PM Bertrand Drouvot
<[email protected]> wrote:
>
> Hi Alexander,
>
> On Sat, Jun 06, 2026 at 12:00:00PM +0300, Alexander Lakhin wrote:
> > Hello hackers,
> >
> > That is, walsender requested WAL segment for timeline 1, while in a
> > successful run, it reads WAL for timeline 2.
> >
> > I've managed to reproduce this failure with:
>
> Thanks for the report and the repro!
>
> > As far as I can see, the timeline is chosen in logical_read_xlog_page()
> > depending on the recovery state:
> > am_cascading_walsender = RecoveryInProgress();
> >
> > if (am_cascading_walsender)
> > GetXLogReplayRecPtr(&currTLI);
> > else
> > currTLI = GetWALInsertionTimeLine();
>
> Yeah, it looks like there is a race condition here. I think we should check if
> the insertion timeline has already been set (like the walsummarizer is doing).
>
> I'll work on a fix early next week.
This looks like the right direction to fix. We may want to apply
similar logic to read_local_xlog_page_guts as well. Although the
failure is reported in walsender, SQL logical decoding uses the local
WAL reader and has the same recovery/TLI pattern.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-08 04:25 ` Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
1 sibling, 2 replies; 24+ messages in thread
From: Hayato Kuroda (Fujitsu) @ 2026-06-08 04:25 UTC (permalink / raw)
To: 'Bertrand Drouvot' <[email protected]>; Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; +Cc: pgsql-hackers
Hi Alexander, Bertrand, Xuneng,
Thanks for seeing the failure. Our team also recognized but could not find the reason.
> Yeah, it looks like there is a race condition here. I think we should check if
> the insertion timeline has already been set (like the walsummarizer is doing).
Sorry for stupid question; I tried to reproduce the failure but could not, see attached.
IIUC, the issue can happen if the walsender must read the WAL record generated
after the promotion but the timeline could not be updated.
However, I think logical_read_xlog_page() is called after the new WAL records
are generated, i.e., am_cascading_walsender has already been false at that time.
So not sure where is the race?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
[application/octet-stream] 0001-WIP-try-reproducing-the-race-condition-for-promotion.patch (6.3K, 2-0001-WIP-try-reproducing-the-race-condition-for-promotion.patch)
download | inline diff:
From 2cf5edb2639104ccc42333e7546c89076309c40f Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <[email protected]>
Date: Mon, 8 Jun 2026 12:42:12 +0900
Subject: [PATCH] WIP: try reproducing the race condition for promotion
---
src/backend/replication/walsender.c | 6 ++
src/test/recovery/meson.build | 1 +
src/test/recovery/t/099_repro.pl | 130 ++++++++++++++++++++++++++++
3 files changed, 137 insertions(+)
create mode 100644 src/test/recovery/t/099_repro.pl
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 04aa770d981..f764007052d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -94,6 +94,7 @@
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/guc.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_lsn.h"
@@ -1103,11 +1104,16 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
*/
am_cascading_walsender = RecoveryInProgress();
+ INJECTION_POINT("logical-read-xlog-page-before-tli", NULL);
+
if (am_cascading_walsender)
GetXLogReplayRecPtr(&currTLI);
else
currTLI = GetWALInsertionTimeLine();
+ elog(LOG, "XXX am_cascading_walsender: %d, currTLI: %u",
+ am_cascading_walsender, currTLI);
+
XLogReadDetermineTimeline(state, targetPagePtr, reqLen, currTLI);
sendTimeLineIsHistoric = (state->currTLI != currTLI);
sendTimeLine = state->currTLI;
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 9eb8ed11425..bfd06a06124 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -62,6 +62,7 @@ tests += {
't/051_effective_wal_level.pl',
't/052_checkpoint_segment_missing.pl',
't/053_standby_login_event_trigger.pl',
+ 't/099_repro.pl',
],
},
}
diff --git a/src/test/recovery/t/099_repro.pl b/src/test/recovery/t/099_repro.pl
new file mode 100644
index 00000000000..909141c0773
--- /dev/null
+++ b/src/test/recovery/t/099_repro.pl
@@ -0,0 +1,130 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Reproducer for wrong timeline bug
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($stdout, $stderr, $cascading_stdout, $cascading_stderr, $handle);
+
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $res;
+
+# Name for the physical slot on primary
+my $primary_slotname = 'primary_physical';
+my $standby_physical_slotname = 'standby_physical';
+
+########################
+# Initialize primary node
+########################
+
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+$node_primary->append_conf(
+ 'postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+autovacuum = off
+});
+$node_primary->dump_info;
+$node_primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node_primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+# Create the injection_points extension
+$node_primary->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node_primary->safe_psql('postgres',
+ qq[SELECT * FROM pg_create_physical_replication_slot('$primary_slotname');]
+);
+
+$node_primary->safe_psql('postgres', "CREATE TABLE foo (id int)");
+
+my $backup_name = 'b1';
+$node_primary->backup($backup_name);
+
+#######################
+# Initialize standby node
+#######################
+
+$node_standby->init_from_backup(
+ $node_primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_standby->append_conf(
+ 'postgresql.conf',
+ qq[primary_slot_name = '$primary_slotname'
+ max_replication_slots = 5]);
+$node_standby->start;
+$node_primary->wait_for_replay_catchup($node_standby);
+
+# create logical slot
+$node_standby->create_logical_slot_on_standby($node_primary, 'testslot',
+ 'postgres');
+
+# Start continuous logical decoding on the standby
+$handle = IPC::Run::start(
+ [
+ 'pg_recvlogical',
+ '--dbname' => $node_standby->connstr('postgres'),
+ '--slot' => 'testslot',
+ '--option' => 'include-xids=0',
+ '--option' => 'skip-empty-xacts=1',
+ '--file' => '-',
+ '--no-loop',
+ '--start',
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr,
+ IPC::Run::timeout($default_timeout));
+
+# Ensure the pg_recvlogical works well
+$node_primary->safe_psql('postgres', "INSERT INTO foo SELECT generate_series(1, 10)");
+
+# check that we are decoding pre and post promotion inserted rows
+# with pg_recvlogical that has started before the promotion
+my $pump_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+
+ok(pump_until($handle, $pump_timeout, \$stdout, qr/^.*COMMIT$/s),
+ 'got 1 COMMIT from pg_recvlogical output');
+
+# Set an injection_point to make the walsender wait before reading the timeline
+# of the standby.
+$node_standby->safe_psql('postgres', "SELECT injection_points_attach('logical-read-xlog-page-before-tli', 'wait');");
+
+# Insert some data to make the walsender read the timeline
+$node_primary->safe_psql('postgres', "INSERT INTO foo SELECT generate_series(11, 20)");
+$node_standby->wait_for_event('walsender', 'logical-read-xlog-page-before-tli');
+
+# Now the standby can accept INSERTs
+$node_standby->promote;
+
+# Insert some data on the promoted standby
+$node_standby->safe_psql('postgres', "INSERT INTO foo SELECT generate_series(21, 30)");
+
+# Walsender now resumes decoding
+$node_standby->safe_psql('postgres', qq{SELECT injection_points_detach('logical-read-xlog-page-before-tli');
+SELECT injection_points_wakeup('logical-read-xlog-page-before-tli');});
+
+# Check that we can decode both pre and post promotion inserted rows with pg_recvlogical.
+ok(pump_until($handle, $pump_timeout, \$stdout, qr/^.*COMMIT.*COMMIT$/s),
+ 'got 2 COMMIT from pg_recvlogical output');
+
+done_testing();
--
2.52.0
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
@ 2026-06-08 08:31 ` Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-08 08:31 UTC (permalink / raw)
To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi Kuroda-san,
On Mon, Jun 8, 2026 at 12:25 PM Hayato Kuroda (Fujitsu)
<[email protected]> wrote:
>
> Hi Alexander, Bertrand, Xuneng,
>
> Thanks for seeing the failure. Our team also recognized but could not find the reason.
>
> > Yeah, it looks like there is a race condition here. I think we should check if
> > the insertion timeline has already been set (like the walsummarizer is doing).
>
> Sorry for stupid question; I tried to reproduce the failure but could not, see attached.
>
> IIUC, the issue can happen if the walsender must read the WAL record generated
> after the promotion but the timeline could not be updated.
I think the race is that the logical walsender is trying to read the
pre-promotion wal from the slot's restart_lsn.
1) In 035_standby_logical_decoding.pl, the active slot /
pg_recvlogical process is started before:
INSERT rows 1..4 # before promotion
promote standby
INSERT rows 5..7 # after promotion
2) Alexander’s reproducer sleeps immediately after:
ReplicationSlotAcquire(cmd->slotname, true, true);
pg_usleep(200000);
So with the delay:
slot is active but walsender has not yet create decoding context and
called xlogbeginread();
3) Then the test continues and promotes the standby while the
walsender still sleeping, therefore promotion could happen before the
walsender starts its actual wal read.
4) The logical walsender is trying to read the pre-promotion wal from
the slot's restart_lsn. The failing log implies this:
Streaming transactions committing after 0/06487A20,
reading WAL from 0/064879E8.
This line is emitted from:
errdetail("Streaming transactions committing after %X/%08X, reading
WAL from %X/%08X.",
slot->data.confirmed_flush,
slot->data.restart_lsn)
Function StartLogicalReplication() initiates the xlogreader at the restart lsn:
XLogBeginRead(logical_decoding_ctx->reader,
MyReplicationSlot->data.restart_lsn);
Because the slot was created and acquired before promotion, its
restart_lsn is from the standby-era state which points to wal required
for decoding the pre-promotion stream. In the test, the first expected
output is rows 1..4, inserted before promotion.
The failing output was empty:
got: ''
expected:
BEGIN
rows 1..4
COMMIT
BEGIN
rows 5..7
COMMIT
Empty output suggests that the walsender died before emitting even the
first pre-promotion transaction.
5) The walsender try to read wal but failed
Failed run:
LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000010000000000000006
ERROR: requested WAL segment
000000010000000000000006 has already been removed
Passed run
LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000020000000000000006
As the log suggested, the walsender of the successful run readed the
wal from timeline 2 but failed to do so for timeline1.
This comment in xlogutils.is almost exactly the issue:
We care about timelines ... reading xlog generated prior to a promotion
...
The server copied the segment to the new timeline ... there's no guarantee
the old segment will still exist. It may have been deleted or renamed with
a .partial suffix
So the need for tli1 does not always imply that the wal was generated
after promotion. It could be bytes from before the switchpoint, but
they now be available through the new tli segment file.
6) XLogReadDetermineTimeline() needs the current system-wide TLI.
The current code in does roughly:
am_cascading_walsender = RecoveryInProgress();
if (am_cascading_walsender)
GetXLogReplayRecPtr(&currTLI);
else
currTLI = GetWALInsertionTimeLine();
During promotion, we set the wal insertion timeline before marking
recovery as fully done. So there is a window where:
- InsertTimeLineID is already the new timeline, e.g. tli 2.
- RecoveryInProgress() can still return true in another backend.
- A logical walsender that started before promotion can wake up and
choose its read timeline using the old recovery-state test.
In the race, replay tli 1 is chosen even though the promotion has
created the new tli 2 wal segment copy. Then WalSndSegmentOpen() tries
to open 000000010000000000000006, which may already be gone or
renamed, while the correct file is 000000020000000000000006.
7) Bertrand's proposal to fix this
Recognizing the intermediate promotion state -- if recovery still
shows progress but the insertion tli is already advanced, use that
insertion tli for the timeline to be chosen.
> However, I think logical_read_xlog_page() is called after the new WAL records
> are generated, i.e., am_cascading_walsender has already been false at that time.
> So not sure where is the race?
This function is not only called after new post-promotion records are
generated. When START_REPLICATION begins, CreateDecodingContext() /
XLogBeginRead() starts from the slot's restart_lsn, which can be well
before promotion.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-08 13:00 ` Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-08 13:00 UTC (permalink / raw)
To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Mon, Jun 8, 2026 at 4:31 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi Kuroda-san,
>
> On Mon, Jun 8, 2026 at 12:25 PM Hayato Kuroda (Fujitsu)
> <[email protected]> wrote:
> >
> > Hi Alexander, Bertrand, Xuneng,
> >
> > Thanks for seeing the failure. Our team also recognized but could not find the reason.
> >
> > > Yeah, it looks like there is a race condition here. I think we should check if
> > > the insertion timeline has already been set (like the walsummarizer is doing).
> >
> > Sorry for stupid question; I tried to reproduce the failure but could not, see attached.
> >
> > IIUC, the issue can happen if the walsender must read the WAL record generated
> > after the promotion but the timeline could not be updated.
>
> I think the race is that the logical walsender is trying to read the
> pre-promotion wal from the slot's restart_lsn.
>
> 1) In 035_standby_logical_decoding.pl, the active slot /
> pg_recvlogical process is started before:
> INSERT rows 1..4 # before promotion
> promote standby
> INSERT rows 5..7 # after promotion
>
> 2) Alexander’s reproducer sleeps immediately after:
> ReplicationSlotAcquire(cmd->slotname, true, true);
> pg_usleep(200000);
>
> So with the delay:
> slot is active but walsender has not yet create decoding context and
> called xlogbeginread();
>
> 3) Then the test continues and promotes the standby while the
> walsender still sleeping, therefore promotion could happen before the
> walsender starts its actual wal read.
>
> 4) The logical walsender is trying to read the pre-promotion wal from
> the slot's restart_lsn. The failing log implies this:
> Streaming transactions committing after 0/06487A20,
> reading WAL from 0/064879E8.
>
> This line is emitted from:
> errdetail("Streaming transactions committing after %X/%08X, reading
> WAL from %X/%08X.",
> slot->data.confirmed_flush,
> slot->data.restart_lsn)
>
> Function StartLogicalReplication() initiates the xlogreader at the restart lsn:
> XLogBeginRead(logical_decoding_ctx->reader,
> MyReplicationSlot->data.restart_lsn);
>
> Because the slot was created and acquired before promotion, its
> restart_lsn is from the standby-era state which points to wal required
> for decoding the pre-promotion stream. In the test, the first expected
> output is rows 1..4, inserted before promotion.
>
> The failing output was empty:
> got: ''
> expected:
> BEGIN
> rows 1..4
> COMMIT
> BEGIN
> rows 5..7
> COMMIT
>
> Empty output suggests that the walsender died before emitting even the
> first pre-promotion transaction.
>
> 5) The walsender try to read wal but failed
> Failed run:
> LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
> path: pg_wal/000000010000000000000006
> ERROR: requested WAL segment
> 000000010000000000000006 has already been removed
>
> Passed run
> LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
> path: pg_wal/000000020000000000000006
>
> As the log suggested, the walsender of the successful run readed the
> wal from timeline 2 but failed to do so for timeline1.
> This comment in xlogutils.is almost exactly the issue:
>
> We care about timelines ... reading xlog generated prior to a promotion
> ...
> The server copied the segment to the new timeline ... there's no guarantee
> the old segment will still exist. It may have been deleted or renamed with
> a .partial suffix
>
> So the need for tli1 does not always imply that the wal was generated
> after promotion. It could be bytes from before the switchpoint, but
> they now be available through the new tli segment file.
>
> 6) XLogReadDetermineTimeline() needs the current system-wide TLI.
>
> The current code in does roughly:
>
> am_cascading_walsender = RecoveryInProgress();
>
> if (am_cascading_walsender)
> GetXLogReplayRecPtr(&currTLI);
> else
> currTLI = GetWALInsertionTimeLine();
>
> During promotion, we set the wal insertion timeline before marking
> recovery as fully done. So there is a window where:
>
> - InsertTimeLineID is already the new timeline, e.g. tli 2.
> - RecoveryInProgress() can still return true in another backend.
> - A logical walsender that started before promotion can wake up and
> choose its read timeline using the old recovery-state test.
>
> In the race, replay tli 1 is chosen even though the promotion has
> created the new tli 2 wal segment copy. Then WalSndSegmentOpen() tries
> to open 000000010000000000000006, which may already be gone or
> renamed, while the correct file is 000000020000000000000006.
>
> 7) Bertrand's proposal to fix this
> Recognizing the intermediate promotion state -- if recovery still
> shows progress but the insertion tli is already advanced, use that
> insertion tli for the timeline to be chosen.
>
> > However, I think logical_read_xlog_page() is called after the new WAL records
> > are generated, i.e., am_cascading_walsender has already been false at that time.
> > So not sure where is the race?
>
> This function is not only called after new post-promotion records are
> generated. When START_REPLICATION begins, CreateDecodingContext() /
> XLogBeginRead() starts from the slot's restart_lsn, which can be well
> before promotion.
>
I tweaked the reproducer based on the theory outlined above. The main
changes from the original reproducer are:
1) blocks at logical-walsender-after-slot-acquire in walsender.c,
before the decoding context is created and before the reader starts
from restart_lsn, matching the delay set by Alexander
2) Forces the first read to occur during promotion. It inserts rows
1..4, waits for replay, starts promotion with pg_promote(false), holds
startup at startup-logical-decoding-status-change-end-of-recovery,
then wakes the walsender.
The test failed on HEAD and passed after applying the patch by Bertrand.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
Attachments:
[application/octet-stream] v1-0001-Reproducer-for-logical-decoding-choosing-the-wron.patch (5.9K, 2-v1-0001-Reproducer-for-logical-decoding-choosing-the-wron.patch)
download | inline diff:
From 15333302b07bac64e2eb0b984c2b4e037e2566ab Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 8 Jun 2026 20:47:56 +0800
Subject: [PATCH v1] Reproducer for logical decoding choosing the wrong
timeline while a standby is being promoted.
---
src/backend/replication/walsender.c | 5 +
src/test/recovery/t/099_repro.pl | 139 ++++++++++++++++++++++++++++
2 files changed, 144 insertions(+)
create mode 100644 src/test/recovery/t/099_repro.pl
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 04aa770d981..dbd48d17250 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -94,6 +94,7 @@
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/guc.h"
+#include "utils/injection_point.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_lsn.h"
@@ -1103,6 +1104,8 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
*/
am_cascading_walsender = RecoveryInProgress();
+ INJECTION_POINT("logical-read-xlog-page-before-tli", NULL);
+
if (am_cascading_walsender)
GetXLogReplayRecPtr(&currTLI);
else
@@ -1501,6 +1504,8 @@ StartLogicalReplication(StartReplicationCmd *cmd)
ReplicationSlotAcquire(cmd->slotname, true, true);
+ INJECTION_POINT("logical-walsender-after-slot-acquire", NULL);
+
/*
* Force a disconnect, so that the decoding code doesn't need to care
* about an eventual switch from running in recovery, to running in a
diff --git a/src/test/recovery/t/099_repro.pl b/src/test/recovery/t/099_repro.pl
new file mode 100644
index 00000000000..83a42ae507e
--- /dev/null
+++ b/src/test/recovery/t/099_repro.pl
@@ -0,0 +1,139 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Reproducer for logical decoding choosing the wrong timeline while a standby
+# is being promoted.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($stdout, $stderr);
+
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+
+my $primary_slotname = 'primary_physical';
+
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+$node_primary->append_conf(
+ 'postgresql.conf', q[
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+autovacuum = off
+]);
+$node_primary->start;
+
+if (!$node_primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+$node_primary->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node_primary->safe_psql('postgres',
+ qq[SELECT * FROM pg_create_physical_replication_slot('$primary_slotname');]
+);
+$node_primary->safe_psql('postgres',
+ 'CREATE TABLE decoding_test(x integer, y text);');
+
+my $backup_name = 'b1';
+$node_primary->backup($backup_name);
+
+$node_standby->init_from_backup(
+ $node_primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_standby->append_conf(
+ 'postgresql.conf',
+ qq[primary_slot_name = '$primary_slotname'
+max_replication_slots = 4]);
+$node_standby->start;
+$node_primary->wait_for_replay_catchup($node_standby);
+
+$node_standby->create_logical_slot_on_standby($node_primary, 'testslot',
+ 'postgres');
+
+$node_standby->safe_psql('postgres',
+ "SELECT injection_points_attach('logical-walsender-after-slot-acquire', 'wait');"
+);
+
+my $handle = IPC::Run::start(
+ [
+ 'pg_recvlogical',
+ '--dbname' => $node_standby->connstr('postgres'),
+ '--slot' => 'testslot',
+ '--option' => 'include-xids=0',
+ '--option' => 'skip-empty-xacts=1',
+ '--file' => '-',
+ '--no-loop',
+ '--start',
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr,
+ IPC::Run::timeout($default_timeout));
+
+$node_standby->wait_for_event('walsender',
+ 'logical-walsender-after-slot-acquire');
+
+$node_primary->safe_psql('postgres',
+ qq[INSERT INTO decoding_test(x,y)
+ SELECT s, s::text FROM generate_series(1,4) s;]);
+$node_primary->wait_for_replay_catchup($node_standby);
+
+$node_standby->safe_psql('postgres',
+ qq[SELECT injection_points_attach('startup-logical-decoding-status-change-end-of-recovery', 'wait');]
+);
+
+$node_standby->safe_psql('postgres', 'SELECT pg_promote(false);');
+$node_standby->wait_for_event('startup',
+ 'startup-logical-decoding-status-change-end-of-recovery');
+
+$node_standby->safe_psql('postgres',
+ "SELECT injection_points_wakeup('logical-walsender-after-slot-acquire');"
+);
+
+my $pump_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);
+ok(pump_until($handle, $pump_timeout, \$stdout, qr/^.*COMMIT$/s),
+ 'decoded pre-promotion transaction while promotion is in progress');
+
+$node_standby->safe_psql('postgres',
+ qq[SELECT injection_points_wakeup('startup-logical-decoding-status-change-end-of-recovery');]
+);
+
+$node_standby->poll_query_until('postgres',
+ "SELECT NOT pg_is_in_recovery();")
+ or die "standby did not finish promotion";
+
+$node_standby->safe_psql('postgres',
+ qq[INSERT INTO decoding_test(x,y)
+ SELECT s, s::text FROM generate_series(5,7) s;]);
+
+ok(pump_until($handle, $pump_timeout, \$stdout, qr/^.*COMMIT.*COMMIT$/s),
+ 'decoded pre- and post-promotion transactions');
+
+my $expected = q{BEGIN
+table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
+table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
+table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
+table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
+COMMIT
+BEGIN
+table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
+table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'
+table public.decoding_test: INSERT: x[integer]:7 y[text]:'7'
+COMMIT};
+
+chomp($stdout);
+is($stdout, $expected, 'got expected output from pg_recvlogical');
+
+done_testing();
--
2.51.0
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-08 14:21 ` Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-08 14:21 UTC (permalink / raw)
To: Xuneng Zhou <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi,
On Mon, Jun 08, 2026 at 09:00:17PM +0800, Xuneng Zhou wrote:
> I tweaked the reproducer based on the theory outlined above. The main
> changes from the original reproducer are:
>
> 1) blocks at logical-walsender-after-slot-acquire in walsender.c,
> before the decoding context is created and before the reader starts
> from restart_lsn, matching the delay set by Alexander
> 2) Forces the first read to occur during promotion. It inserts rows
> 1..4, waits for replay, starts promotion with pg_promote(false), holds
> startup at startup-logical-decoding-status-change-end-of-recovery,
> then wakes the walsender.
Yeah, so this existing startup-logical-decoding-status-change-end-of-recovery
injection point already exists in the code tree and is also called after
CleanupAfterArchiveRecovery() and before RECOVERY_STATE_DONE change in
StartupXLOG().
So this is the same window as the new injection point that was added in the new
tests in v1-0002 shared up-thread [1].
That said, I think I prefer the v1-0002 shared up-thread [1] approach for the
tests as:
- the injection point name clearly describes the tested condition
- no new injection point is added in walsender.c (it pauses startup mid-promotion
and lets the walsender connect)
- the test relies on one injection point (not two)
[1]: https://postgr.es/m/aiaBtENl7tTf2MM8%40bdtpg
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-08 14:34 ` Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-08 14:34 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot
<[email protected]> wrote:
>
> Hi,
>
> On Mon, Jun 08, 2026 at 09:00:17PM +0800, Xuneng Zhou wrote:
> > I tweaked the reproducer based on the theory outlined above. The main
> > changes from the original reproducer are:
> >
> > 1) blocks at logical-walsender-after-slot-acquire in walsender.c,
> > before the decoding context is created and before the reader starts
> > from restart_lsn, matching the delay set by Alexander
>
> > 2) Forces the first read to occur during promotion. It inserts rows
> > 1..4, waits for replay, starts promotion with pg_promote(false), holds
> > startup at startup-logical-decoding-status-change-end-of-recovery,
> > then wakes the walsender.
>
> Yeah, so this existing startup-logical-decoding-status-change-end-of-recovery
> injection point already exists in the code tree and is also called after
> CleanupAfterArchiveRecovery() and before RECOVERY_STATE_DONE change in
> StartupXLOG().
>
> So this is the same window as the new injection point that was added in the new
> tests in v1-0002 shared up-thread [1].
>
> That said, I think I prefer the v1-0002 shared up-thread [1] approach for the
> tests as:
>
> - the injection point name clearly describes the tested condition
> - no new injection point is added in walsender.c (it pauses startup mid-promotion
> and lets the walsender connect)
> - the test relies on one injection point (not two)
>
> [1]: https://postgr.es/m/aiaBtENl7tTf2MM8%40bdtpg
>
Thanks for the clarification. I haven't reviewed the patch set other
than applying the patch 1 yet, but I plan to do so tomorrow.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-09 07:49 ` Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-09 07:49 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Mon, Jun 8, 2026 at 10:34 PM Xuneng Zhou <[email protected]> wrote:
>
> On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot
> <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, Jun 08, 2026 at 09:00:17PM +0800, Xuneng Zhou wrote:
> > > I tweaked the reproducer based on the theory outlined above. The main
> > > changes from the original reproducer are:
> > >
> > > 1) blocks at logical-walsender-after-slot-acquire in walsender.c,
> > > before the decoding context is created and before the reader starts
> > > from restart_lsn, matching the delay set by Alexander
> >
> > > 2) Forces the first read to occur during promotion. It inserts rows
> > > 1..4, waits for replay, starts promotion with pg_promote(false), holds
> > > startup at startup-logical-decoding-status-change-end-of-recovery,
> > > then wakes the walsender.
> >
> > Yeah, so this existing startup-logical-decoding-status-change-end-of-recovery
> > injection point already exists in the code tree and is also called after
> > CleanupAfterArchiveRecovery() and before RECOVERY_STATE_DONE change in
> > StartupXLOG().
> >
> > So this is the same window as the new injection point that was added in the new
> > tests in v1-0002 shared up-thread [1].
> >
> > That said, I think I prefer the v1-0002 shared up-thread [1] approach for the
> > tests as:
> >
> > - the injection point name clearly describes the tested condition
> > - no new injection point is added in walsender.c (it pauses startup mid-promotion
> > and lets the walsender connect)
> > - the test relies on one injection point (not two)
> >
> > [1]: https://postgr.es/m/aiaBtENl7tTf2MM8%40bdtpg
> >
>
> Thanks for the clarification. I haven't reviewed the patch set other
> than applying the patch 1 yet, but I plan to do so tomorrow.
I've readed through the patch set. They look good overall. Here're
some comments on them:
1) In the commit messages and comments for all four patches, the
reason why the target WAL segment cannot be found on the old timeline
is described as follows:
"old timeline WAL segments have already been removed or
recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()."
Is mentioning the 'remove' case only a bit narrow?
The timeline-selection comment says this explicitly:
"there's no guarantee the old segment will still exist. It may have been
deleted or renamed with a .partial suffix"
How about phrasing it like:
old timeline WAL files may have been removed, recycled, or renamed to .partial.
After running the reproducer provided by Hayato-san, the standby’s
pg_wal directory looked like this following the failure:
000000010000000000000003.partial
00000002.history
000000020000000000000003
000000020000000000000004
So in this repro, the requested file:
000000010000000000000003
was not unlinked as a regular "removed" file. It had been renamed to:
000000010000000000000003.partial
but the log says this explicitly:
ERROR: requested WAL segment 000000010000000000000003 has already been removed
It appears inconsistent to me...
2) Injection points in tests 0002 and 0004
> > - the injection point name clearly describes the tested condition
I think this is true. It better reflects the intention than
startup-logical-decoding-status-change-end-of-recovery does. However,
I don't know whether this alone warrants a new injection point.
> > - no new injection point is added in walsender.c (it pauses startup mid-promotion
> > and lets the walsender connect)
> > - the test relies on one injection point (not two)
I am unsure that the injection point could replace the walsender-side
synchronization need.
The race has two moving processes:
- startup process: promotion cleanup / RECOVERY_STATE_DONE transition
- walsender process: logical_read_xlog_page() timeline choice
A startup injection point controls only this:
startup is paused in the vulnerable promotion window
It does not prove this:
walsender has reached logical_read_xlog_page() while startup is paused
3) Stricter synchronization point in both tests
Both tests use this condition "active_pid IS NOT NULL" for
synchronization at the walsender side. However, it only proves that
pg_recvlogical has connected walsender has acquired the logical slot,
not necessarily the walsender is paused after acquiring the slot and
before the promotion window is set. There are several potential states
for walsender in this condition:
walsender is just after ReplicationSlotAcquire()
it has called XLogBeginRead()
it is already inside logical_read_xlog_page()
it already opened the WAL segment
it already failed or succeeded
The test cannot distinguish those states reliably.
So we may still need another injection point for synchronization at
the walsender side
4) Stricter result checks in test files
The surrounding 035 test is stricter than the test in 0002. It first
waits for COMMITs, then compares exact decoded output. Should we
adhere to this pattern too?
0004's \d+ check seems somewhat loose to me. Should we capture the
query result and compare it to the expected count, or select data and
compare the decoded rows?
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-09 11:01 ` Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-09 11:01 UTC (permalink / raw)
To: Xuneng Zhou <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi,
On Tue, Jun 09, 2026 at 03:49:50PM +0800, Xuneng Zhou wrote:
> On Mon, Jun 8, 2026 at 10:34 PM Xuneng Zhou <[email protected]> wrote:
> >
> > On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot
> > <[email protected]> wrote:
>
> I've readed through the patch set. They look good overall.
Thanks for the review!
> Here're
> some comments on them:
>
> 1) In the commit messages and comments for all four patches, the
> reason why the target WAL segment cannot be found on the old timeline
> is described as follows:
>
> "old timeline WAL segments have already been removed or
> recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()."
>
> Is mentioning the 'remove' case only a bit narrow?
>
> The timeline-selection comment says this explicitly:
> "there's no guarantee the old segment will still exist. It may have been
> deleted or renamed with a .partial suffix"
>
> How about phrasing it like:
> old timeline WAL files may have been removed, recycled, or renamed to .partial.
>
> After running the reproducer provided by Hayato-san, the standby’s
> pg_wal directory looked like this following the failure:
> 000000010000000000000003.partial
> 00000002.history
> 000000020000000000000003
> 000000020000000000000004
>
> So in this repro, the requested file:
>
> 000000010000000000000003
>
> was not unlinked as a regular "removed" file. It had been renamed to:
>
> 000000010000000000000003.partial
>
> but the log says this explicitly:
> ERROR: requested WAL segment 000000010000000000000003 has already been removed
>
> It appears inconsistent to me...
I'm not sure. The error message says "has already been removed" and the commit
messages and comments says"removed or recycled": those are consistent with the
error message. We're describing the symptom from the walsender's perspective,
not the exact file operation that caused it.
> 2) Injection points in tests 0002 and 0004
>
> It does not prove this:
> walsender has reached logical_read_xlog_page() while startup is paused
>
> 3) Stricter synchronization point in both tests
> Both tests use this condition "active_pid IS NOT NULL" for
> synchronization at the walsender side. However, it only proves that
> pg_recvlogical has connected walsender has acquired the logical slot,
> not necessarily the walsender is paused after acquiring the slot and
> before the promotion window is set. There are several potential states
> for walsender in this condition:
>
> walsender is just after ReplicationSlotAcquire()
> it has called XLogBeginRead()
> it is already inside logical_read_xlog_page()
> it already opened the WAL segment
> it already failed or succeeded
>
> The test cannot distinguish those states reliably.
>
> So we may still need another injection point for synchronization at
> the walsender side
I agree that with v1 the test could have been fragile. It's fixed in v2 without
having to add a second injection point. All we have to do is to ensure that
the decoding occurred while the startup is paused on the new injection point.
0002 does that by starting the new walsender and doing the decoding while the
startup is paused
0004 does that by ensuring the pre-connected session triggers the decoding while
the startup is paused
> 4) Stricter result checks in test files
> The surrounding 035 test is stricter than the test in 0002. It first
> waits for COMMITs, then compares exact decoded output. Should we
> adhere to this pattern too?
That's done in v2 (and also adress Hayato-san feedback).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-10 08:36 ` Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-10 08:36 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Tue, Jun 9, 2026 at 7:01 PM Bertrand Drouvot
<[email protected]> wrote:
>
> Hi,
>
> On Tue, Jun 09, 2026 at 03:49:50PM +0800, Xuneng Zhou wrote:
> > On Mon, Jun 8, 2026 at 10:34 PM Xuneng Zhou <[email protected]> wrote:
> > >
> > > On Mon, Jun 8, 2026 at 10:22 PM Bertrand Drouvot
> > > <[email protected]> wrote:
> >
> > I've readed through the patch set. They look good overall.
>
> Thanks for the review!
>
> > Here're
> > some comments on them:
> >
> > 1) In the commit messages and comments for all four patches, the
> > reason why the target WAL segment cannot be found on the old timeline
> > is described as follows:
> >
> > "old timeline WAL segments have already been removed or
> > recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery()."
> >
> > Is mentioning the 'remove' case only a bit narrow?
> >
> > The timeline-selection comment says this explicitly:
> > "there's no guarantee the old segment will still exist. It may have been
> > deleted or renamed with a .partial suffix"
> >
> > How about phrasing it like:
> > old timeline WAL files may have been removed, recycled, or renamed to .partial.
> >
> > After running the reproducer provided by Hayato-san, the standby’s
> > pg_wal directory looked like this following the failure:
> > 000000010000000000000003.partial
> > 00000002.history
> > 000000020000000000000003
> > 000000020000000000000004
> >
> > So in this repro, the requested file:
> >
> > 000000010000000000000003
> >
> > was not unlinked as a regular "removed" file. It had been renamed to:
> >
> > 000000010000000000000003.partial
> >
> > but the log says this explicitly:
> > ERROR: requested WAL segment 000000010000000000000003 has already been removed
> >
> > It appears inconsistent to me...
>
> I'm not sure. The error message says "has already been removed" and the commit
> messages and comments says"removed or recycled": those are consistent with the
> error message. We're describing the symptom from the walsender's perspective,
> not the exact file operation that caused it.
> > 2) Injection points in tests 0002 and 0004
> >
> > It does not prove this:
> > walsender has reached logical_read_xlog_page() while startup is paused
> >
> > 3) Stricter synchronization point in both tests
> > Both tests use this condition "active_pid IS NOT NULL" for
> > synchronization at the walsender side. However, it only proves that
> > pg_recvlogical has connected walsender has acquired the logical slot,
> > not necessarily the walsender is paused after acquiring the slot and
> > before the promotion window is set. There are several potential states
> > for walsender in this condition:
> >
> > walsender is just after ReplicationSlotAcquire()
> > it has called XLogBeginRead()
> > it is already inside logical_read_xlog_page()
> > it already opened the WAL segment
> > it already failed or succeeded
> >
> > The test cannot distinguish those states reliably.
> >
> > So we may still need another injection point for synchronization at
> > the walsender side
>
> I agree that with v1 the test could have been fragile. It's fixed in v2 without
> having to add a second injection point. All we have to do is to ensure that
> the decoding occurred while the startup is paused on the new injection point.
>
> 0002 does that by starting the new walsender and doing the decoding while the
> startup is paused
> 0004 does that by ensuring the pre-connected session triggers the decoding while
> the startup is paused
That should work, and it’s cleverer. I was fixated on the idea that we
needed to start the walsender, pause it, suspend the startup process
to enter the promotion window, and then resume the walsender. The
essential thing is just to ensure that the startup remains paused
until decoding output is observed.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-10 17:28 ` Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-10 17:28 UTC (permalink / raw)
To: Xuneng Zhou <[email protected]>; +Cc: Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi,
On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
> On Tue, Jun 9, 2026 at 7:01 PM Bertrand Drouvot
> > 0002 does that by starting the new walsender and doing the decoding while the
> > startup is paused
> > 0004 does that by ensuring the pre-connected session triggers the decoding while
> > the startup is paused
>
> The
> essential thing is just to ensure that the startup remains paused
> until decoding output is observed.
Right, thanks for confirming. That's exactly what v2 is doing.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-11 01:15 ` Michael Paquier <[email protected]>
2026-06-11 09:53 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
2026-06-12 00:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 2 replies; 24+ messages in thread
From: Michael Paquier @ 2026-06-11 01:15 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
>> The
>> essential thing is just to ensure that the startup remains paused
>> until decoding output is observed.
>
> Right, thanks for confirming. That's exactly what v2 is doing.
I have looked at this thread, and my first impression was that this
could be a data integrity issue while decoding changes due to the
transient errors one could see across the promotion requests.
But it's less severe than I thought initially: we have an availability
problem here, down to v16, with a correct recovery possible once the
promotion request has completed. That could be indeed surprising for
users that have HA setups with standbys doing logical decoding.. The
SQL function path is less worrying to me, there are as far as I know
few users of it compared to the "native" path with sync workers.
read_local_xlog_page_guts() does not only impact SQL-callable logirep
functions, even it is the spot that should be hit most of the time
(again, the RecoveryInProgress() vs promotion window is super narrow).
At quick glance, things are:
- walinspect.
- Slot advance.
- Slot creation (?), but it feels even narrower.
With two items dealt with on this thread for these two callback paths
changed, moving on the part related to physical replication into its
own thread would be better. This requires an entirely different
analysis and a different lookup.
The backpatch of PG16 is straight-forward and adding
GetWALInsertionTimeLineIfSet() down there does not look like an issue.
Not having any tests in v16 feels sad, but that's life. It does not
prevent addressing the availability issue on this branch.
I'll go take it up from here.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
@ 2026-06-11 09:53 ` Michael Paquier <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Michael Paquier @ 2026-06-11 09:53 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Thu, Jun 11, 2026 at 10:15:01AM +0900, Michael Paquier wrote:
> The backpatch of PG16 is straight-forward and adding
> GetWALInsertionTimeLineIfSet() down there does not look like an issue.
> Not having any tests in v16 feels sad, but that's life. It does not
> prevent addressing the availability issue on this branch.
Backpatch of GetWALInsertionTimeLineIfSet() done in v16.
> I'll go take it up from here.
I have spent some time on the first part of this patch series,
focusing on the noisiest path for logical replication, and applied it
down to v17. I have been able to reproduce the problem using
Alexander's sleep trick, and let 035 run with the same trick and the
patch applied for a few hundred runs.
Note that REL_17_STABLE has currently 4 failures in the buildfarm, due
to me messing up with the existence of injection_points. I'll
probably just remove the test on this branch, 105b2cb33617 adding the
requirement on the module only in v18~.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
@ 2026-06-12 00:57 ` Xuneng Zhou <[email protected]>
2026-06-12 02:44 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-12 00:57 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi Michael,
On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <[email protected]> wrote:
>
> On Wed, Jun 10, 2026 at 05:28:00PM +0000, Bertrand Drouvot wrote:
> > On Wed, Jun 10, 2026 at 04:36:14PM +0800, Xuneng Zhou wrote:
> >> The
> >> essential thing is just to ensure that the startup remains paused
> >> until decoding output is observed.
> >
> > Right, thanks for confirming. That's exactly what v2 is doing.
>
> I have looked at this thread, and my first impression was that this
> could be a data integrity issue while decoding changes due to the
> transient errors one could see across the promotion requests.
>
> But it's less severe than I thought initially: we have an availability
> problem here, down to v16, with a correct recovery possible once the
> promotion request has completed. That could be indeed surprising for
> users that have HA setups with standbys doing logical decoding.. The
> SQL function path is less worrying to me, there are as far as I know
> few users of it compared to the "native" path with sync workers.
>
> read_local_xlog_page_guts() does not only impact SQL-callable logirep
> functions, even it is the spot that should be hit most of the time
> (again, the RecoveryInProgress() vs promotion window is super narrow).
> At quick glance, things are:
> - walinspect.
> - Slot advance.
> - Slot creation (?), but it feels even narrower.
Yeah, it is used for two-phase commit as well. The usage of it is
broader than I observed before. Repack worker also make use of it.
> With two items dealt with on this thread for these two callback paths
> changed, moving on the part related to physical replication into its
> own thread would be better. This requires an entirely different
> analysis and a different lookup.
+1
> The backpatch of PG16 is straight-forward and adding
> GetWALInsertionTimeLineIfSet() down there does not look like an issue.
> Not having any tests in v16 feels sad, but that's life. It does not
> prevent addressing the availability issue on this branch.
>
> I'll go take it up from here.
> --
Thanks for dealing with this!
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
2026-06-12 00:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
@ 2026-06-12 02:44 ` Michael Paquier <[email protected]>
2026-06-12 03:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-12 06:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 2 replies; 24+ messages in thread
From: Michael Paquier @ 2026-06-12 02:44 UTC (permalink / raw)
To: Xuneng Zhou <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Fri, Jun 12, 2026 at 08:57:05AM +0800, Xuneng Zhou wrote:
> On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <[email protected]> wrote:
>> read_local_xlog_page_guts() does not only impact SQL-callable logirep
>> functions, even it is the spot that should be hit most of the time
>> (again, the RecoveryInProgress() vs promotion window is super narrow).
>> At quick glance, things are:
>> - walinspect.
>> - Slot advance.
>> - Slot creation (?), but it feels even narrower.
>
> Yeah, it is used for two-phase commit as well. The usage of it is
> broader than I observed before. Repack worker also make use of it.
Repack workers cannot work on standbys, and for 2PC files we have the
guarantee of a file on disk if the WAL is gone, so they should be
basically irrelevant here. That's why I did not mention them. :)
Everything's now done. I was pondering about v14 and v15 for a bit
for the second patch regarding pg_walinspect (or slot operations), but
I could not get excited about these branches. pg_walinspect is less
used, and slots have much more uses on standbys since v16~, so
expanding that further down feels meh.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
2026-06-12 00:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-12 02:44 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
@ 2026-06-12 03:00 ` Bertrand Drouvot <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-12 03:00 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
Hi,
On Fri, Jun 12, 2026 at 11:44:47AM +0900, Michael Paquier wrote:
>
> Everything's now done.
Thanks!
> I was pondering about v14 and v15 for a bit
> for the second patch regarding pg_walinspect (or slot operations), but
> I could not get excited about these branches. pg_walinspect is less
> used, and slots have much more uses on standbys since v16~, so
> expanding that further down feels meh.
That sounds reasonable to me.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
2026-06-12 00:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
2026-06-12 02:44 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Michael Paquier <[email protected]>
@ 2026-06-12 06:28 ` Xuneng Zhou <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-12 06:28 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Bertrand Drouvot <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; Alexander Lakhin <[email protected]>; pgsql-hackers
On Fri, Jun 12, 2026 at 10:44 AM Michael Paquier <[email protected]> wrote:
>
> On Fri, Jun 12, 2026 at 08:57:05AM +0800, Xuneng Zhou wrote:
> > On Thu, Jun 11, 2026 at 9:15 AM Michael Paquier <[email protected]> wrote:
> >> read_local_xlog_page_guts() does not only impact SQL-callable logirep
> >> functions, even it is the spot that should be hit most of the time
> >> (again, the RecoveryInProgress() vs promotion window is super narrow).
> >> At quick glance, things are:
> >> - walinspect.
> >> - Slot advance.
> >> - Slot creation (?), but it feels even narrower.
> >
> > Yeah, it is used for two-phase commit as well. The usage of it is
> > broader than I observed before. Repack worker also make use of it.
>
> Repack workers cannot work on standbys, and for 2PC files we have the
> guarantee of a file on disk if the WAL is gone, so they should be
> basically irrelevant here. That's why I did not mention them. :)
Thanks for the clarification.
> Everything's now done. I was pondering about v14 and v15 for a bit
> for the second patch regarding pg_walinspect (or slot operations), but
> I could not get excited about these branches. pg_walinspect is less
> used, and slots have much more uses on standbys since v16~, so
> expanding that further down feels meh.
Making sense to me. This looks like a hard-to-hit issue on HEAD. The
likelihood of encountering it and the severity of damage caused by it
in those branches are expected to be even smaller.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
@ 2026-06-08 08:47 ` Bertrand Drouvot <[email protected]>
2026-06-09 04:04 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-09 06:03 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
1 sibling, 2 replies; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-08 08:47 UTC (permalink / raw)
To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; pgsql-hackers
Hi,
On Mon, Jun 08, 2026 at 04:25:45AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Hi Alexander, Bertrand, Xuneng,
>
> Thanks for seeing the failure. Our team also recognized but could not find the reason.
>
> > Yeah, it looks like there is a race condition here. I think we should check if
> > the insertion timeline has already been set (like the walsummarizer is doing).
>
> IIUC, the issue can happen if the walsender must read the WAL record generated
> after the promotion but the timeline could not be updated.
>
> However, I think logical_read_xlog_page() is called after the new WAL records
> are generated, i.e., am_cascading_walsender has already been false at that time.
> So not sure where is the race?
I ended up with this conclusion:
During promotion, there is a window where RecoveryInProgress() still
returns true but old timeline WAL segments have already been removed or
recycled by RemoveNonParentXlogFiles() in CleanupAfterArchiveRecovery().
This is because, in StartupXLOG(), WAL segments are cleaned up before
SharedRecoveryState transitions to RECOVERY_STATE_DONE.
If a walsender performing logical decoding calls logical_read_xlog_page()
during this window, it would get the old timeline from GetXLogReplayRecPtr(),
then attempt to open a WAL segment on that old timeline which no longer exists.
Attached:
0001: To fix this race
Fix by checking GetWALInsertionTimeLineIfSet() when RecoveryInProgress()
returns true. If InsertTimeLineID is already set (non-zero), the new timeline is
established and we use it directly, avoiding attempts to read from segments that
may have been removed.
0002: Adding a test in 035_standby_logical_decoding.pl
It makes use of a new injection point "promotion-after-wal-segment-cleanup" in
StartupXLOG(), right after CleanupAfterArchiveRecovery() removes old timeline
WAL segments but before SharedRecoveryState is set to RECOVERY_STATE_DONE.
The test fails without the fix in 0001 so it also somehow proves that the
diagnostic is right.
0003: Apply the same timeline fix to read_local_xlog_page_guts()
Indeed, it could hit the same race as mentioned by Xuneng-San.
0004: Add a test for 0003
Remark:
As far as the backpatching down to 16, it looks like 0001 to 0004 could be
backpatched as they are down to 17. For 16, we may want to also introduce
GetWALInsertionTimeLineIfSet().
I can have a closer look for the backpatch once we agree on how to fix those
races on master.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-09 04:04 ` Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-09 05:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Hayato Kuroda (Fujitsu) @ 2026-06-09 04:04 UTC (permalink / raw)
To: 'Bertrand Drouvot' <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; pgsql-hackers
Dear Bertrand, Xuneng,
Thanks for sharing the reproducer and opinion. I could now understand.
While seeing, I felt the same issue can happen even for the streaming case.
Please see attached reproducer and possible fix.
Can you also include that point? Or this thread discusses only for the logical case?
If separated, I can take initiative for the streaming part, after the logical decoding
case.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachments:
[application/octet-stream] 0001-Reproducer-for-the-streaming-replication.patch (4.6K, 2-0001-Reproducer-for-the-streaming-replication.patch)
download | inline diff:
From 67e407e18c64dff51904232ef98f3f992dd82d03 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <[email protected]>
Date: Tue, 9 Jun 2026 12:48:19 +0900
Subject: [PATCH 1/2] Reproducer for the streaming replication
---
src/test/recovery/meson.build | 1 +
src/test/recovery/t/099_repro.pl | 123 +++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+)
create mode 100644 src/test/recovery/t/099_repro.pl
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 9eb8ed11425..bfd06a06124 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -62,6 +62,7 @@ tests += {
't/051_effective_wal_level.pl',
't/052_checkpoint_segment_missing.pl',
't/053_standby_login_event_trigger.pl',
+ 't/099_repro.pl',
],
},
}
diff --git a/src/test/recovery/t/099_repro.pl b/src/test/recovery/t/099_repro.pl
new file mode 100644
index 00000000000..34a97bb19f3
--- /dev/null
+++ b/src/test/recovery/t/099_repro.pl
@@ -0,0 +1,123 @@
+# Copyright (c) 2026, PostgreSQL Global Development Group
+
+# Reproducer wrong timeline can be chosen during the promotion for the
+# streaming replication.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+my ($stdout, $stderr, $handle);
+
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+my $default_timeout = $PostgreSQL::Test::Utils::timeout_default;
+
+# Name for the physical slot on primary
+my $primary_slotname = 'primary_physical';
+my $standby_physical_slotname = 'standby_physical';
+
+########################
+# Initialize primary node
+########################
+
+$node_primary->init(allows_streaming => 1, has_archiving => 1);
+$node_primary->append_conf(
+ 'postgresql.conf', q{
+wal_level = 'logical'
+max_replication_slots = 4
+max_wal_senders = 4
+autovacuum = off
+});
+$node_primary->dump_info;
+$node_primary->start;
+
+# Check if the extension injection_points is available, as it may be
+# possible that this script is run with installcheck, where the module
+# would not be installed by default.
+if (!$node_primary->check_extension('injection_points'))
+{
+ plan skip_all => 'Extension injection_points not installed';
+}
+
+$node_primary->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+$node_primary->safe_psql('postgres',
+ qq[SELECT * FROM pg_create_physical_replication_slot('$primary_slotname');]
+);
+
+my $backup_name = 'b1';
+$node_primary->backup($backup_name);
+
+# Some tests need to wait for VACUUM to be replayed. But vacuum does not flush
+# WAL. An insert into flush_wal outside transaction does guarantee a flush.
+$node_primary->psql('postgres', q[CREATE TABLE flush_wal();]);
+
+#######################
+# Initialize standby node
+#######################
+
+$node_standby->init_from_backup(
+ $node_primary, $backup_name,
+ has_streaming => 1,
+ has_restoring => 1);
+$node_standby->append_conf(
+ 'postgresql.conf',
+ qq[primary_slot_name = '$primary_slotname'
+ max_replication_slots = 5]);
+$node_standby->start;
+$node_primary->wait_for_replay_catchup($node_standby);
+
+$node_standby->safe_psql('postgres',
+ "SELECT pg_create_physical_replication_slot('physical_standby');"
+);
+
+# Attach injection point to pause startup after WAL segment cleanup
+# but before RecoveryInProgress() flips to false.
+$node_standby->safe_psql('postgres',
+ "SELECT injection_points_attach('promotion-after-wal-segment-cleanup', 'wait');"
+);
+
+# Promote with no-wait so we can synchronize with the injection point.
+$node_standby->safe_psql('postgres', "SELECT pg_promote(false)");
+
+# Wait for startup to pause after removing old timeline WAL segments.
+$node_standby->wait_for_event('startup',
+ 'promotion-after-wal-segment-cleanup');
+
+my $stream_dir = $node_primary->basedir . '/archive_wal';
+mkdir($stream_dir);
+
+my $log_offset = -s $node_standby->logfile;
+
+# Start pg_receivewal
+$handle = IPC::Run::start(
+ [
+ 'pg_receivewal',
+ '--directory' => $stream_dir,
+ '--dbname' => $node_standby->connstr('postgres'),
+ '--slot' => 'physical_standby',
+ '--no-loop',
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr,
+ IPC::Run::timeout($default_timeout));
+
+# XXX wait a bit to make sure pg_receivewal has started
+sleep(1);
+
+# Check the log content
+ok( !$node_standby->log_contains(
+ "requested WAL segment [0-9A-F]+ has already been removed",
+ $log_offset),
+ 'check that required WAL segments are still available');
+
+done_testing();
--
2.52.0
[application/octet-stream] 0002-Fix-race-condition-in-streaming-replication-timeline.patch (1.4K, 3-0002-Fix-race-condition-in-streaming-replication-timeline.patch)
download | inline diff:
From 6730d101000ada2a1aa1fe8d5bd17f47b96f4b8b Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <[email protected]>
Date: Tue, 9 Jun 2026 13:01:19 +0900
Subject: [PATCH 2/2] Fix race condition in streaming replication timeline
selection during promotion
---
src/backend/replication/walsender.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e80ed052077..f2e9bbe9d95 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3389,7 +3389,7 @@ XLogSendPhysical(void)
*/
bool becameHistoric = false;
- SendRqstPtr = GetStandbyFlushRecPtr(&SendRqstTLI);
+ SendRqstPtr = GetStandbyFlushRecPtr(NULL);
if (!RecoveryInProgress())
{
@@ -3400,9 +3400,22 @@ XLogSendPhysical(void)
}
else
{
+ TimeLineID insertTLI;
+
+ /*
+ * If the insertion timeline has already been set, use it. See
+ * logical_read_xlog_page() for details.
+ */
+ insertTLI = GetWALInsertionTimeLine();
+
+ if (insertTLI != 0)
+ SendRqstTLI = insertTLI;
+ else
+ GetXLogReplayRecPtr(&SendRqstTLI);
+
/*
- * Still a cascading standby. But is the timeline we're sending
- * still the one recovery is recovering from?
+ * Is the timeline we're sending still the one recovery is
+ * recovering from?
*/
if (sendTimeLine != SendRqstTLI)
becameHistoric = true;
--
2.52.0
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-09 04:04 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
@ 2026-06-09 05:57 ` Bertrand Drouvot <[email protected]>
2026-06-09 06:13 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Xuneng Zhou <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-09 05:57 UTC (permalink / raw)
To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; pgsql-hackers
Hi Hayato-san,
On Tue, Jun 09, 2026 at 04:04:20AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Dear Bertrand, Xuneng,
>
> Thanks for sharing the reproducer and opinion. I could now understand.
>
> While seeing, I felt the same issue can happen even for the streaming case.
> Please see attached reproducer and possible fix.
I just had a quick look and I feel that's probably an issue too (good find!).
I think that your fix should use GetWALInsertionTimeLineIfSet() though (but I
only did a quick scan on your patch).
> Can you also include that point? Or this thread discusses only for the logical case?
> If separated, I can take initiative for the streaming part, after the logical decoding
> case.
I would says it's probably better to keep this thread focusing on Alexander's issue
(logical decoding). Once this thread is fixed, then I'd suggest to open a dedicated
thread for the streaming one (mentioning this thread and probably trying to mimic
its fix as close as possible).
Also I would suggest not to post "real" patches here as they conflict with mine
in the cfbot (means the cfbot won't tell me anymore if my patches need a rebase).
Please use .txt files or use "nocfbot" instead if you feel the need.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-09 04:04 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-09 05:57 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-09 06:13 ` Xuneng Zhou <[email protected]>
0 siblings, 0 replies; 24+ messages in thread
From: Xuneng Zhou @ 2026-06-09 06:13 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; pgsql-hackers
On Tue, Jun 9, 2026 at 1:57 PM Bertrand Drouvot
<[email protected]> wrote:
>
> Hi Hayato-san,
>
> On Tue, Jun 09, 2026 at 04:04:20AM +0000, Hayato Kuroda (Fujitsu) wrote:
> > Dear Bertrand, Xuneng,
> >
> > Thanks for sharing the reproducer and opinion. I could now understand.
> >
> > While seeing, I felt the same issue can happen even for the streaming case.
> > Please see attached reproducer and possible fix.
Good catch!
> I just had a quick look and I feel that's probably an issue too (good find!).
> I think that your fix should use GetWALInsertionTimeLineIfSet() though (but I
> only did a quick scan on your patch).
>
> > Can you also include that point? Or this thread discusses only for the logical case?
> > If separated, I can take initiative for the streaming part, after the logical decoding
> > case.
>
> I would says it's probably better to keep this thread focusing on Alexander's issue
> (logical decoding). Once this thread is fixed, then I'd suggest to open a dedicated
> thread for the streaming one (mentioning this thread and probably trying to mimic
> its fix as close as possible).
>
+1 for starting a dedicated thread. Please CC me in that thread in
case I miss it.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-09 06:03 ` Bertrand Drouvot <[email protected]>
2026-06-09 07:05 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Bertrand Drouvot @ 2026-06-09 06:03 UTC (permalink / raw)
To: Hayato Kuroda (Fujitsu) <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; pgsql-hackers
Hi,
On Mon, Jun 08, 2026 at 08:47:48AM +0000, Bertrand Drouvot wrote:
> Attached:
>
> 0001: To fix this race
>
> Fix by checking GetWALInsertionTimeLineIfSet() when RecoveryInProgress()
> returns true. If InsertTimeLineID is already set (non-zero), the new timeline is
> established and we use it directly, avoiding attempts to read from segments that
> may have been removed.
>
> 0002: Adding a test in 035_standby_logical_decoding.pl
>
> It makes use of a new injection point "promotion-after-wal-segment-cleanup" in
> StartupXLOG(), right after CleanupAfterArchiveRecovery() removes old timeline
> WAL segments but before SharedRecoveryState is set to RECOVERY_STATE_DONE.
>
> The test fails without the fix in 0001 so it also somehow proves that the
> diagnostic is right.
>
> 0003: Apply the same timeline fix to read_local_xlog_page_guts()
>
> Indeed, it could hit the same race as mentioned by Xuneng-San.
>
> 0004: Add a test for 0003
Re-attaching v1 patches here to have the cfbot focusing on those.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 24+ messages in thread
* RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-08 04:25 ` RE: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:47 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
2026-06-09 06:03 ` Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Bertrand Drouvot <[email protected]>
@ 2026-06-09 07:05 ` Hayato Kuroda (Fujitsu) <[email protected]>
0 siblings, 0 replies; 24+ messages in thread
From: Hayato Kuroda (Fujitsu) @ 2026-06-09 07:05 UTC (permalink / raw)
To: 'Bertrand Drouvot' <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected] <[email protected]>; pgsql-hackers
Dear Bertrand,
Thanks for re-posting. Let's focus on the logical decoding stuff.
> > 0004: Add a test for 0003
I found a comment for the test.
```
+# Issue SQL decoding (read_local_xlog_page_guts path) on the pre-connected
+# session.
+$decode_session->query_until(
+ qr/decoding_started/, qq(
+ \\echo decoding_started
+ SELECT count(*) FROM pg_logical_slot_peek_changes('race_slot_sql', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+));
```
Not sure, pg_logical_slot_peek_changes() can be finished without waiting anything,
right? So why do we use query_until() and check the output here?
IIUC, the command can immediately raise an ERROR if the race happened. So isn't it
enough to use safe_psql() to ensure the SQL function can work?
Best regards,
Hayato Kuroda
FUJITSU LIMITED
^ permalink raw reply [nested|flat] 24+ messages in thread
end of thread, other threads:[~2026-06-12 06:28 UTC | newest]
Thread overview: 24+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-06 09:00 t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline Alexander Lakhin <[email protected]>
2026-06-06 11:07 ` Bertrand Drouvot <[email protected]>
2026-06-06 12:56 ` Xuneng Zhou <[email protected]>
2026-06-08 04:25 ` Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-08 08:31 ` Xuneng Zhou <[email protected]>
2026-06-08 13:00 ` Xuneng Zhou <[email protected]>
2026-06-08 14:21 ` Bertrand Drouvot <[email protected]>
2026-06-08 14:34 ` Xuneng Zhou <[email protected]>
2026-06-09 07:49 ` Xuneng Zhou <[email protected]>
2026-06-09 11:01 ` Bertrand Drouvot <[email protected]>
2026-06-10 08:36 ` Xuneng Zhou <[email protected]>
2026-06-10 17:28 ` Bertrand Drouvot <[email protected]>
2026-06-11 01:15 ` Michael Paquier <[email protected]>
2026-06-11 09:53 ` Michael Paquier <[email protected]>
2026-06-12 00:57 ` Xuneng Zhou <[email protected]>
2026-06-12 02:44 ` Michael Paquier <[email protected]>
2026-06-12 03:00 ` Bertrand Drouvot <[email protected]>
2026-06-12 06:28 ` Xuneng Zhou <[email protected]>
2026-06-08 08:47 ` Bertrand Drouvot <[email protected]>
2026-06-09 04:04 ` Hayato Kuroda (Fujitsu) <[email protected]>
2026-06-09 05:57 ` Bertrand Drouvot <[email protected]>
2026-06-09 06:13 ` Xuneng Zhou <[email protected]>
2026-06-09 06:03 ` Bertrand Drouvot <[email protected]>
2026-06-09 07:05 ` Hayato Kuroda (Fujitsu) <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox