public inbox for [email protected]  
help / color / mirror / Atom feed
From: Alexander Lakhin <[email protected]>
To: pgsql-hackers <[email protected]>
Subject: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
Date: Sat, 6 Jun 2026 12:00:00 +0300
Message-ID: <[email protected]> (raw)

Hello hackers,

I'd like to present a recent failure of 035_standby_logical_decoding on
buildfarm [1]:
### Promoting node "standby"
# Running: pg_ctl --pgdata 
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata 
--log 
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log 
promote
waiting for server to promote.... done
server promoted
Waiting for replication conn cascading_standby's replay_lsn to pass 0/0648A290 on standby
done
[10:40:16.996](0.446s) ok 84 - got expected output from SQL decoding session on promoted standby
[10:40:16.997](0.001s) # pump_until: process terminated unexpectedly when searching for "(?^s:^.*COMMIT.*COMMIT$)" with 
stream: ""
[10:40:16.997](0.000s) not ok 85 - got 2 COMMIT from pg_recvlogical output
[10:40:16.997](0.000s) #   Failed test 'got 2 COMMIT from pg_recvlogical output'
#   at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1036.
[10:40:16.997](0.000s) not ok 86 - got same expected output from pg_recvlogical decoding session
[10:40:16.997](0.000s) #   Failed test 'got same expected output from pg_recvlogical decoding session'
#   at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1040.
[10:40:16.998](0.000s) #          got: ''
#     expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
# table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
# table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
# table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
# COMMIT
# BEGIN
# table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
# table public.decoding_test: INSERT: x[integer]:6 y[text]:'6'
# table public.decoding_test: INSERT: x[integer]:7 y[text]:'7'
# COMMIT'
[10:40:17.009](0.011s) ok 87 - got expected output from SQL decoding session on cascading standby
[10:40:17.009](0.000s) ok 88 - got 2 COMMIT from pg_recvlogical output

035_standby_logical_decoding_standby.log contains:
2026-06-03 10:40:16.745 CEST [3955206][startup][29/0:0] LOG: received promote request
...
2026-06-03 10:40:16.784 CEST [3955206][startup][29/0:0] LOG: archive recovery complete
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: starting logical decoding for slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] DETAIL: Streaming transactions committing after 0/06487A20, 
reading WAL from 0/064879E8.
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" 
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] ERROR: requested WAL segment 000000010000000000000006 has 
already been removed
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" 
LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: released logical replication slot "promotion_activeslot"
2026-06-03 10:40:16.862 CEST [3963805][walsender][:0] LOG: disconnection: session time: 0:00:00.212 user=bf 
database=testdb host=[local]
...

That is, walsender requested WAL segment for timeline 1, while in a
successful run, it reads WAL for timeline 2.

I've managed to reproduce this failure with:
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1502,2 +1502,3 @@ StartLogicalReplication(StartReplicationCmd *cmd)
      ReplicationSlotAcquire(cmd->slotname, true, true);
+pg_usleep(200000);

Being executed in a loop, the test fails for me within 100 iterations
(within 10 on a slower VM):
ITERATION 23
# +++ tap check in src/test/recovery +++
t/035_standby_logical_decoding.pl .. 84/? # pump_until: process terminated unexpectedly when searching for 
"(?^s:^.*COMMIT.*COMMIT$)" with stream: ""

#   Failed test 'got 2 COMMIT from pg_recvlogical output'
#   at t/035_standby_logical_decoding.pl line 1038.

#   Failed test 'got same expected output from pg_recvlogical decoding session'
#   at t/035_standby_logical_decoding.pl line 1042.
#          got: ''
#     expected: 'BEGIN
# table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
...

As far as I can see, the timeline is chosen in logical_read_xlog_page()
depending on the recovery state:
         am_cascading_walsender = RecoveryInProgress();

         if (am_cascading_walsender)
                 GetXLogReplayRecPtr(&currTLI);
         else
                 currTLI = GetWALInsertionTimeLine();

With diagnostic logging added here, a failed run shows:
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  starting logical decoding for slot 
"promotion_activeslot"
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl DETAIL:  Streaming transactions committing 
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT 
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  !!!logical_read_xlog_page| 
am_cascading_walsender: 1, currTLI: 1
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT 
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  !!!WalSndSegmentOpen| nextSegNo: 6, 
path: pg_wal/000000010000000000000006
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT 
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl ERROR:  requested WAL segment 
000000010000000000000006 has already been removed

In a successful run, am_cascading_walsender/RecoveryInProgress() == false and currTLI == 2:
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  starting logical decoding for slot 
"promotion_activeslot"
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl DETAIL:  Streaming transactions committing 
after 0/0647C7E8, reading WAL from 0/0647C7B0.
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT 
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  !!!logical_read_xlog_page| 
am_cascading_walsender: 0, currTLI: 2
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT 
"promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1')
2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  !!!WalSndSegmentOpen| nextSegNo: 6, 
path: pg_wal/000000020000000000000006

Reproduced at the commit fcd77d532, which added the test.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-06-03%2008%3A31%3A43

Best regards,
Alexander






view thread (24+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox