public inbox for [email protected]  
help / color / mirror / Atom feed
From: Xuneng Zhou <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Tue, 26 May 2026 23:53:14 +0800
Message-ID: <CABPTF7Xp0fUK0LiZVeGSNse3Wb6jRy_c9u2TOOBhmgx3K3KACA@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7XxDonXAcz6DsN6AUJB3swYrZkJHq3UCDaD3Q2H+j0gUA@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
	<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
	<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
	<CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
	<CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
	<CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com>
	<CABPTF7UcuVD0L-X=jZFfeygjPaZWWkVRwtWOaJw2tcXbEN2xsA@mail.gmail.com>
	<CABPTF7Wdq6KbvC3EhLX3Pz=ODCCPEX7qVQ+E=cokkB91an2E-A@mail.gmail.com>
	<CAPpHfdv_BS7csGyg_=pPanRQM9Sf6_wBWNGdVzJRAv0U4eH9cg@mail.gmail.com>
	<CAPpHfds7oSCbZqob7ytT_Lso8fv-NW8LnedUTE4Krde+3rkJeA@mail.gmail.com>
	<CABPTF7WiDtWCR82geyaaaCRV9UiPR5YUHRNDysD_7Ltr1ymfug@mail.gmail.com>
	<CABPTF7Um7oRKBYmek_3gfbaMg5DLzHZmAX-GEacHASfD34xo5g@mail.gmail.com>
	<CABPTF7V-E_e3kQ2vtwUz6Jy7u-8_YeUT0SDoAbu7EKPgNp=ndA@mail.gmail.com>
	<CAPpHfdtNiSqQCu+YtTYcc+K4q9FwtZuAtQ5Qs+KoaZZM4QyYTA@mail.gmail.com>
	<[email protected]>
	<CABPTF7V=q+OCSHLMV0v78UU=pz6b7rC81vJ5+XKnVkfAebyEwA@mail.gmail.com>
	<CABPTF7X61iuyZT3K3rZssi9qyBsfo+cGs4vWBZ6KexHi1L6hMA@mail.gmail.com>
	<CAPpHfdtQGQpUXoqWpRghhoG_-PbUtxNTgza=am4-HHa=EmXAVQ@mail.gmail.com>
	<CABPTF7XDVHC-FDtZNWep6eCJEiaGOBskmYbVivZGZ2p-V23-QA@mail.gmail.com>
	<CABPTF7VLWuzHeeRYTb0AWb5639hDcW9P2yYSk7v1ESxCGgiUJA@mail.gmail.com>
	<CAPpHfdvu4g2GFOCs9efDv4DV1Of17A2UWQL2S7CQe6rR3m-7_A@mail.gmail.com>
	<CABPTF7XxDonXAcz6DsN6AUJB3swYrZkJHq3UCDaD3Q2H+j0gUA@mail.gmail.com>

On Tue, May 26, 2026 at 9:48 AM Xuneng Zhou <[email protected]> wrote:
>
> On Mon, May 25, 2026 at 5:00 PM Alexander Korotkov <[email protected]> wrote:
> >
> > On Sat, May 23, 2026 at 9:40 PM Xuneng Zhou <[email protected]> wrote:
> > >> > I agree with you.  But do we actually need a
> > >> > wait_for_standby_and_slot_catchup() wrapper.  I think we can call
> > >> > $node->wait_for_slot_catchup() directly and simplify the fix.  Check
> > >> > the attached patch.
> > >> >
> > >>
> > >> The patch looks good to me. I agree that the wait_for_slot_catchup is
> > >> not needed and could be misleading. This change would make the exact
> > >> synchronization point and its intention clearer. The only price we
> > >> need to pay here is bringing back the polling. But it seems acceptable
> > >> since the cost was there in the pre-wait-for-lsn era. And thanks for
> > >> writing the great commit message!
> > >
> > >
> > > Sorry for copy-pasting the wrong function name. It should be wait_for_catchup().
> >
> > Good, thank you.  I'll push it if no objections.
>
> While reading 019_replslot_limit.pl, Codex pointed out a few
> inconsistencies in the comments. I verified them and they look real.
> Would you mind doing a small cleanup as well?

I updated the comment above wait_for_slot_catchup to reflect its usage.

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Attachments:

  [application/octet-stream] v3-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch (3.7K, 2-v3-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch)
  download | inline diff:
From cbbe124cd5a397764721f48e27de4020f8b46603 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 26 May 2026 09:37:24 +0800
Subject: [PATCH v3 1/2] Stabilize 019_replslot_limit.pl after
 wait_for_catchup() semantic change

wait_for_catchup() now returns as soon as the standby has replayed the
target LSN locally, rather than waiting until the primary observes that
position via pg_stat_replication. 019_replslot_limit.pl, however,
checks primary-side pg_replication_slots state, which depends on the
slot's restart_lsn -- and restart_lsn advances only after the primary's
walsender processes a standby reply. The previous polling
wait_for_catchup() implicitly waited for that round trip; the
WAIT FOR LSN-based one does not, so the subtests "check that the slot
state changes to 'extended' / 'unreserved'" become flappy (reproducible
with small artificial delays in XLogWalRcvSendReply /
ProcessStandbyReplyMessage).

Replace each wait_for_catchup() in this test with
wait_for_slot_catchup('rep1', 'restart', primary->lsn('write')).
restart_lsn cannot move ahead of the standby's replayed position, so
this single wait transitively covers both the standby replay and the
primary's observation of it, which is exactly the precondition the
slot-state assertions require.

Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Author: Xuneng Zhou <[email protected]>
Author: Alexander Korotkov <[email protected]>
---
 src/test/recovery/t/019_replslot_limit.pl | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7b253e64d9c..882ffb66550 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -44,8 +44,12 @@ $node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'");
 
 $node_standby->start;
 
-# Wait until standby has replayed enough data
-$node_primary->wait_for_catchup($node_standby);
+# Wait until the primary has processed standby feedback and advanced the
+# slot's restart_lsn.  For a physical slot, restart_lsn is updated from
+# the standby's reported flush position, so this waits for the primary-side
+# slot state that the following wal_status checks depend on.
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 # Stop standby
 $node_standby->stop;
@@ -79,7 +83,8 @@ is($result, "reserved|t", 'check that slot is working');
 # The standby can reconnect to primary
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
@@ -109,7 +114,8 @@ is($result, "reserved",
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # wal_keep_size overrides max_slot_wal_keep_size
@@ -128,7 +134,8 @@ $result = $node_primary->safe_psql('postgres',
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
@@ -155,7 +162,8 @@ is($result, "unreserved|t",
 # The standby still can connect to primary before a checkpoint
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
-- 
2.51.0



  [application/octet-stream] v3-0002-Clean-up-replslot-limit-test-comments.patch (2.6K, 3-v3-0002-Clean-up-replslot-limit-test-comments.patch)
  download | inline diff:
From a50f27e72ba8b0f99ad67d2f66c1db5ddd55deb8 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 26 May 2026 09:40:04 +0800
Subject: [PATCH v3 2/2] Clean up replslot limit test comments

Update stale comments and test names in 019_replslot_limit.pl to match
the actual WAL advancement and wal_status checks. Remove a redundant
standby stop in the inactive_since coverage.
---
 src/test/recovery/t/019_replslot_limit.pl | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 882ffb66550..a412faf51c6 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -60,7 +60,7 @@ $result = $node_primary->safe_psql('postgres',
 );
 is($result, "reserved|t", 'check the catching-up state');
 
-# Advance WAL by five segments (= 5MB) on primary
+# Advance WAL by one segment (= 1MB) on primary
 $node_primary->advance_wal(1);
 $node_primary->safe_psql('postgres', "CHECKPOINT;");
 
@@ -110,7 +110,7 @@ $node_primary->safe_psql('postgres', "CHECKPOINT;");
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
 is($result, "reserved",
-	'check that safe_wal_size gets close to the current LSN');
+	'check that slot remains reserved after advancing WAL');
 
 # The standby can reconnect to primary
 $node_standby->start;
@@ -121,7 +121,7 @@ $node_standby->stop;
 # wal_keep_size overrides max_slot_wal_keep_size
 $result = $node_primary->safe_psql('postgres',
 	"ALTER SYSTEM SET wal_keep_size to '8MB'; SELECT pg_reload_conf();");
-# Advance WAL again then checkpoint, reducing remain by 6 MB.
+# Advance WAL again, reducing remain by 6 MB.
 $node_primary->advance_wal(6);
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status as remain FROM pg_replication_slots WHERE slot_name = 'rep1'"
@@ -141,7 +141,7 @@ $node_standby->stop;
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
 $node_primary->advance_wal(6);
 
-# Slot gets into 'reserved' state
+# Slot gets into 'extended' state
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
 is($result, "extended", 'check that the slot state changes to "extended"');
@@ -480,8 +480,6 @@ is( $primary4->safe_psql(
 	't',
 	'last inactive time for an inactive physical slot is updated correctly');
 
-$standby4->stop;
-
 # Testcase end: Check inactive_since property of the streaming standby's slot
 # =============================================================================
 
-- 
2.51.0



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Implement waiting for wal lsn replay: reloaded
  In-Reply-To: <CABPTF7Xp0fUK0LiZVeGSNse3Wb6jRy_c9u2TOOBhmgx3K3KACA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox