public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Lakhin <[email protected]>
To: pgsql-hackers <[email protected]>
Subject: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: Sat, 6 Jun 2026 12:00:00 +0300
Message-ID: <[email protected]> (raw)
Hello hackers,
I'd like to present a recent failure of 035_standby_logical_decoding on
buildfarm [1]:
### Promoting node "standby"
# Running: pg_ctl --pgdata
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata
--log
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log
promote
waiting for server to promote.... done
server promoted
Waiting for replication conn cascading_standby's replay_lsn to pass 0/0648A290 on standby
done
[10:40:16.996](0.446s) ok 84 - got expected output from SQL decoding session on promoted standby
[10:40:16.997](0.001s) # pump_until: process terminated unexpectedly when searching for "(?^s:^.*COMMIT.*COMMIT$)" with
stream: ""
[10:40:16.997](0.000s) not ok 85 - got 2 COMMIT from pg_recvlogical output
[10:40:16.997](0.000s) # Failed test 'got 2 COMMIT from pg_recvlogical output'
# at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1036.
[10:40:16.997](0.000s) not ok 86 - got same expected output from pg_recvlogical decoding session
[10:40:16.997](0.000s) # Failed test 'got same expected output from pg_recvlogical decoding session'
# at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1040.
[10:40:16.998](0.000s) # got: ''
# expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
# table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
# table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
# table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
# COMMIT
# BEGIN
# table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
# table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'
# table public.decoding_test: INSERT: x[integer]:7 y[text]:'7'
# COMMIT'
[10:40:17.009](0.011s) ok 87 - got expected output from SQL decoding session on cascading standby
[10:40:17.009](0.000s) ok 88 - got 2 COMMIT from pg_recvlogical output
035_standby_logical_decoding_standby.log contains:
2026-06-03 10:40:16.745 CEST [3955206][startup][29/0:0] LOG: received promote request
...
2026-06-03 10:40:16.784 CEST [3955206][startup][29/0:0] LOG: archive recovery complete
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: starting logical decoding for slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] DETAIL: Streaming transactions committing after 0/06487A20,
reading WAL from 0/064879E8.
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT: START_REPLICATION SLOT "promotion_activeslot"
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] ERROR: requested WAL segment 000000010000000000000006 has
already been removed
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT: START_REPLICATION SLOT "promotion_activeslot"
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: released logical replication slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][:0] LOG: disconnection: session time: 0:00:00.212 user=bf
database=testdb host=[local]
...
That is, walsender requested WAL segment for timeline 1, while in a
successful run, it reads WAL for timeline 2.
I've managed to reproduce this failure with:
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1502,2 +1502,3 @@ StartLogicalReplication(StartReplicationCmd *cmd)
ReplicationSlotAcquire(cmd->slotname, true, true);
+pg_usleep(200000);
Being executed in a loop, the test fails for me within 100 iterations
(within 10 on a slower VM):
ITERATION 23
# +++ tap check in src/test/recovery +++
t/035_standby_logical_decoding.pl .. 84/? # pump_until: process terminated unexpectedly when searching for
"(?^s:^.*COMMIT.*COMMIT$)" with stream: ""
# Failed test 'got 2 COMMIT from pg_recvlogical output'
# at t/035_standby_logical_decoding.pl line 1038.
# Failed test 'got same expected output from pg_recvlogical decoding session'
# at t/035_standby_logical_decoding.pl line 1042.
# got: ''
# expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
...
As far as I can see, the timeline is chosen in logical_read_xlog_page()
depending on the recovery state:
am_cascading_walsender = RecoveryInProgress();
if (am_cascading_walsender)
GetXLogReplayRecPtr(&currTLI);
else
currTLI = GetWALInsertionTimeLine();
With diagnostic logging added here, a failed run shows:
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: starting logical decoding for slot
"promotion_activeslot"
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl DETAIL: Streaming transactions committing
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: !!!logical_read_xlog_page|
am_cascading_walsender: 1, currTLI: 1
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000010000000000000006
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl ERROR: requested WAL segment
000000010000000000000006 has already been removed
In a successful run, am_cascading_walsender/RecoveryInProgress() == false and currTLI == 2:
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: starting logical decoding for slot
"promotion_activeslot"
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl DETAIL: Streaming transactions committing
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: !!!logical_read_xlog_page|
am_cascading_walsender: 0, currTLI: 2
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT: START_REPLICATION SLOT
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG: !!!WalSndSegmentOpen| nextSegNo: 6,
path: pg_wal/000000020000000000000006
Reproduced at the commit fcd77d532, which added the test.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-06-03%2008%3A31%3A43
Best regards,
Alexander
view thread (24+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox