public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Lakhin <[email protected]>
To: Alexander Korotkov <[email protected]>
To: Xuneng Zhou <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Tue, 19 May 2026 23:00:00 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAPpHfdtNiSqQCu+YtTYcc+K4q9FwtZuAtQ5Qs+KoaZZM4QyYTA@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
<CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
<CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
<CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com>
<CABPTF7UcuVD0L-X=jZFfeygjPaZWWkVRwtWOaJw2tcXbEN2xsA@mail.gmail.com>
<CABPTF7Wdq6KbvC3EhLX3Pz=ODCCPEX7qVQ+E=cokkB91an2E-A@mail.gmail.com>
<CAPpHfdv_BS7csGyg_=pPanRQM9Sf6_wBWNGdVzJRAv0U4eH9cg@mail.gmail.com>
<CAPpHfds7oSCbZqob7ytT_Lso8fv-NW8LnedUTE4Krde+3rkJeA@mail.gmail.com>
<CABPTF7WiDtWCR82geyaaaCRV9UiPR5YUHRNDysD_7Ltr1ymfug@mail.gmail.com>
<CABPTF7Um7oRKBYmek_3gfbaMg5DLzHZmAX-GEacHASfD34xo5g@mail.gmail.com>
<CABPTF7V-E_e3kQ2vtwUz6Jy7u-8_YeUT0SDoAbu7EKPgNp=ndA@mail.gmail.com>
<CAPpHfdtNiSqQCu+YtTYcc+K4q9FwtZuAtQ5Qs+KoaZZM4QyYTA@mail.gmail.com>
Hello Alexander and Xuneng,
06.04.2026 22:49, Alexander Korotkov wrote:
> Thank you, I've pushed your version of patchset. I made two minor
> corrections for patch #2: mention default mode value in the header
> comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
I discovered a new test failure, that is apparently caused by new
wait_for_catchup() implementation [1]:
[06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
[06:20:23.110](0.001s) # Failed test 'check that the slot state changes to "extended"'
# at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
[06:20:23.111](0.000s) # got: 'unreserved'
# expected: 'extended'
[06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
[06:20:23.231](0.000s) # Failed test 'check that the slot state changes to "unreserved"'
# at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
[06:20:23.231](0.000s) # got: 'lost|'
# expected: 'unreserved|t'
I've managed to reproduce such failures with:
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce..493ce92674e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
+pg_usleep(10000);
/* Get current timestamp. */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 04aa770d981..19cda3a6b51 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
+pg_usleep(100000);
/* the caller already consumed the msgtype byte */
Concretely, a loop:
for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
failed for me on iterations 2, 1, 7:
ITERATION 7
# +++ tap check in src/test/recovery +++
t/019_replslot_limit.pl .. 8/?
# Failed test 'check that the slot state changes to "extended"'
# at t/019_replslot_limit.pl line 140.
# got: 'unreserved'
# expected: 'extended'
t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/26 subtests
With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
passed.
Having extra logging added, I could see the key difference.
Failed run:
2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG: !!!GetWALAvailability| targetLSN:
0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT: SELECT wal_status FROM
pg_replication_slots WHERE slot_name = 'rep1'
vs
Successful run:
2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG: !!!GetWALAvailability| targetLSN:
0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT: SELECT wal_status FROM
pg_replication_slots WHERE slot_name = 'rep1'
That is, with WAIT FOR LSN, primary in this test may advance
slot->data.restart_lsn to the expected position after wait_for_catchup()
returns.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03
Best regards,
Alexander
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Implement waiting for wal lsn replay: reloaded
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox