Re: Implement waiting for wal lsn replay: reloaded

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alexander Korotkov <[email protected]>
To: Xuneng Zhou <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Wed, 20 May 2026 15:16:45 +0300
Message-ID: <CAPpHfdtQGQpUXoqWpRghhoG_-PbUtxNTgza=am4-HHa=EmXAVQ@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7X61iuyZT3K3rZssi9qyBsfo+cGs4vWBZ6KexHi1L6hMA@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
	<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
	<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
	<CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
	<CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
	<CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com>
	<CABPTF7UcuVD0L-X=jZFfeygjPaZWWkVRwtWOaJw2tcXbEN2xsA@mail.gmail.com>
	<CABPTF7Wdq6KbvC3EhLX3Pz=ODCCPEX7qVQ+E=cokkB91an2E-A@mail.gmail.com>
	<CAPpHfdv_BS7csGyg_=pPanRQM9Sf6_wBWNGdVzJRAv0U4eH9cg@mail.gmail.com>
	<CAPpHfds7oSCbZqob7ytT_Lso8fv-NW8LnedUTE4Krde+3rkJeA@mail.gmail.com>
	<CABPTF7WiDtWCR82geyaaaCRV9UiPR5YUHRNDysD_7Ltr1ymfug@mail.gmail.com>
	<CABPTF7Um7oRKBYmek_3gfbaMg5DLzHZmAX-GEacHASfD34xo5g@mail.gmail.com>
	<CABPTF7V-E_e3kQ2vtwUz6Jy7u-8_YeUT0SDoAbu7EKPgNp=ndA@mail.gmail.com>
	<CAPpHfdtNiSqQCu+YtTYcc+K4q9FwtZuAtQ5Qs+KoaZZM4QyYTA@mail.gmail.com>
	<[email protected]>
	<CABPTF7V=q+OCSHLMV0v78UU=pz6b7rC81vJ5+XKnVkfAebyEwA@mail.gmail.com>
	<CABPTF7X61iuyZT3K3rZssi9qyBsfo+cGs4vWBZ6KexHi1L6hMA@mail.gmail.com>

Hi, Xuneng!

On Wed, May 20, 2026 at 8:18 AM Xuneng Zhou <[email protected]> wrote:
> On Tue, May 19, 2026 at 8:30 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, May 19, 2026 at 1:00 PM Alexander Lakhin <[email protected]> wrote:
> > >
> > > Hello Alexander and Xuneng,
> > >
> > > 06.04.2026 22:49, Alexander Korotkov wrote:
> > >
> > > Thank you, I've pushed your version of patchset.  I made two minor
> > > corrections for patch #2: mention default mode value in the header
> > > comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
> > >
> > >
> > > I discovered a new test failure, that is apparently caused by new
> > > wait_for_catchup() implementation [1]:
> > > [06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
> > > [06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
> > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
> > > [06:20:23.111](0.000s) #          got: 'unreserved'
> > > #     expected: 'extended'
> > > [06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
> > > [06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
> > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
> > > [06:20:23.231](0.000s) #          got: 'lost|'
> > > #     expected: 'unreserved|t'
> > >
> > > I've managed to reproduce such failures with:
> > > diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> > > index 07eac07b9ce..493ce92674e 100644
> > > --- a/src/backend/replication/walreceiver.c
> > > +++ b/src/backend/replication/walreceiver.c
> > > @@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
> > >
> > > +pg_usleep(10000);
> > >      /* Get current timestamp. */
> > > diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> > > index 04aa770d981..19cda3a6b51 100644
> > > --- a/src/backend/replication/walsender.c
> > > +++ b/src/backend/replication/walsender.c
> > > @@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
> > >
> > > +pg_usleep(100000);
> > >      /* the caller already consumed the msgtype byte */
> > >
> > > Concretely, a loop:
> > > for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
> > > failed for me on iterations 2, 1, 7:
> > > ITERATION 7
> > > # +++ tap check in src/test/recovery +++
> > > t/019_replslot_limit.pl .. 8/?
> > > #   Failed test 'check that the slot state changes to "extended"'
> > > #   at t/019_replslot_limit.pl line 140.
> > > #          got: 'unreserved'
> > > #     expected: 'extended'
> > > t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
> > > t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> > > Failed 1/26 subtests
> > >
> > > With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
> > > passed.
> > >
> > > Having extra logging added, I could see the key difference.
> > > Failed run:
> > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
> > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > > vs
> > > Successful run:
> > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
> > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > >
> > > That is, with WAIT FOR LSN, primary in this test may advance
> > > slot->data.restart_lsn to the expected position after wait_for_catchup()
> > > returns.
> > >
> > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03
> >
> > Thanks for reporting this issue.
> >
> > I think this is related to the semantic change made earlier:
> > wait_for_catchup() now returns once the standby itself has reached the
> > target LSN, rather than waiting until the primary observes that
> > position via pg_stat_replication.
> >
> > As a result, the primary may not yet have processed the standby
> > feedback needed to advance the slot's restart_lsn when
> > wait_for_catchup() returns.
> >
> > Actually, I was aware of this semantic change and previously thought
> > it might be harmless. But this failure appears to disprove that. I'll
> > prepare a patch to fix this shortly.
>
> After some consideration, 019_replslot_limit.pl appears to the better
> place to place the fix rather than by restoring the old primary-side
> polling barrier in wait_for_catchup().
>
> The new wait_for_catchup() behavior is closer to its natural
> semantics: for replay/write/flush modes, it waits until the standby
> itself has reached the requested LSN. The old implementation used
> pg_stat_replication on the primary, which also implied that the
> primary had received and processed standby feedback. That was a useful
> side effect for this test, but it is not required by most callers.
>
> 019_replslot_limit.pl is different because it checks primary-side slot
> state. For a physical slot, restart_lsn advances only after the
> primary's walsender processes standby feedback. So the test needs an
> extra condition beyond ordinary standby catchup.
>
> The patch makes that dependency explicit: wait for the standby to
> replay the target LSN, then wait for the slot's restart_lsn on the
> primary to pass the same LSN. This keeps wait_for_catchup() focused on
> standby catchup while making the slot-specific synchronization visible
> in the test that needs it.

I agree with you.  But do we actually need a
wait_for_standby_and_slot_catchup() wrapper.  I think we can call
$node->wait_for_slot_catchup() directly and simplify the fix.  Check
the attached patch.

------
Regards,
Alexander Korotkov
Supabase


Attachments:

  [application/octet-stream] v2-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch (3.7K, 2-v2-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch)
  download | inline diff:
From 62335545e68ac14335467d3cfe5f4b3fdd2d5d25 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <[email protected]>
Date: Wed, 20 May 2026 15:12:37 +0300
Subject: [PATCH v2] Stabilize 019_replslot_limit.pl after wait_for_catchup()
 semantic change

wait_for_catchup() now returns as soon as the standby has replayed the
target LSN locally, rather than waiting until the primary observes that
position via pg_stat_replication.  019_replslot_limit.pl, however,
checks primary-side pg_replication_slots state, which depends on the
slot's restart_lsn -- and restart_lsn advances only after the primary's
walsender processes a standby reply.  The previous polling
wait_for_catchup() implicitly waited for that round trip; the
WAIT FOR LSN-based one does not, so the subtests "check that the slot
state changes to 'extended' / 'unreserved'" become flappy (reproducible
with small artificial delays in XLogWalRcvSendReply /
ProcessStandbyReplyMessage).

Replace each wait_for_catchup() in this test with
wait_for_slot_catchup('rep1', 'restart', primary->lsn('write')).
restart_lsn cannot move ahead of the standby's replayed position, so
this single wait transitively covers both the standby replay and the
primary's observation of it, which is exactly the precondition the
slot-state assertions require.

Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Author: Xuneng Zhou <[email protected]>
Author: Alexander Korotkov <[email protected]>
---
 src/test/recovery/t/019_replslot_limit.pl | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7b253e64d9c..472aa07587f 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -44,8 +44,12 @@ $node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'");
 
 $node_standby->start;
 
-# Wait until standby has replayed enough data
-$node_primary->wait_for_catchup($node_standby);
+# Wait until the primary has processed standby feedback and advanced
+# the slot's restart_lsn.  restart_lsn moves only after the standby's
+# reply reaches the walsender, so this transitively guarantees that
+# the standby itself has replayed past the target LSN.
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 # Stop standby
 $node_standby->stop;
@@ -79,7 +83,8 @@ is($result, "reserved|t", 'check that slot is working');
 # The standby can reconnect to primary
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
@@ -109,7 +114,8 @@ is($result, "reserved",
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # wal_keep_size overrides max_slot_wal_keep_size
@@ -128,7 +134,8 @@ $result = $node_primary->safe_psql('postgres',
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
@@ -155,7 +162,8 @@ is($result, "unreserved|t",
 # The standby still can connect to primary before a checkpoint
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
-- 
2.39.5 (Apple Git-154)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Implement waiting for wal lsn replay: reloaded
  In-Reply-To: <CAPpHfdtQGQpUXoqWpRghhoG_-PbUtxNTgza=am4-HHa=EmXAVQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox