Re: Implement waiting for wal lsn replay: reloaded

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Implement waiting for wal lsn replay: reloaded
20+ messages / 4 participants
[nested] [flat]

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-07 00:32  Andres Freund <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Andres Freund @ 2026-01-07 00:32 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; Alexander Korotkov <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> Could this be causing the recent flapping failures on CI/macOS in
> recovery/031_recovery_conflict?  I didn't have time to dig personally
> but f30848cb looks relevant:
> 
> Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on primary
> error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> conflict with recovery
> DETAIL:  User was or might have been using tablespace that must be dropped.'
> while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> --dbname port=25195
> host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> line 2300.
> 
> https://cirrus-ci.com/task/5771274900733952
> 
> The master branch in time-descending order, macOS tasks only:
> 
>      task_id      | substring |  status
> ------------------+-----------+-----------
>  6460882231754752 | c970bdc0  | FAILED
>  5771274900733952 | 6ca8506e  | FAILED
>  6217757068361728 | 63ed3bc7  | FAILED
>  5980650261446656 | ae283736  | FAILED
>  6585898394976256 | 5f13999a  | COMPLETED
>  4527474786172928 | 7f9acc9b  | COMPLETED
>  4826100842364928 | e8d4e94a  | COMPLETED
>  4540563027918848 | b9ee5f2d  | FAILED
>  6358528648019968 | c5af141c  | FAILED
>  5998005284765696 | e212a0f8  | COMPLETED
>  6488580526178304 | b85d5dc0  | FAILED
>  5034091344560128 | 7dc95cc3  | ABORTED
>  5688692477526016 | bb048e31  | COMPLETED
>  5481187977723904 | d351063e  | COMPLETED
>  5101831568752640 | f30848cb  | COMPLETED <-- the change
>  6395317408497664 | 3f33b63d  | COMPLETED
>  6741325208354816 | 877ae5db  | COMPLETED
>  4594007789010944 | de746e0d  | COMPLETED
>  6497208998035456 | 461b8cc9  | COMPLETED

The failure rates of this are very high - the majority of the CI runs on the
postgres/postgres repos failed since the change went in. Which then also means
cfbot has a very high spurious failure rate. I think we need to revert this
change until the problem has been verified as fixed.

Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-07 04:08  Xuneng Zhou <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-07 04:08 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Thomas Munro <[email protected]>; Alexander Korotkov <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

On Wed, Jan 7, 2026 at 8:32 AM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> > Could this be causing the recent flapping failures on CI/macOS in
> > recovery/031_recovery_conflict?  I didn't have time to dig personally
> > but f30848cb looks relevant:
> >
> > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on primary
> > error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> > conflict with recovery
> > DETAIL:  User was or might have been using tablespace that must be dropped.'
> > while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> > --dbname port=25195
> > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> > no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > line 2300.
> >
> > https://cirrus-ci.com/task/5771274900733952
> >
> > The master branch in time-descending order, macOS tasks only:
> >
> >      task_id      | substring |  status
> > ------------------+-----------+-----------
> >  6460882231754752 | c970bdc0  | FAILED
> >  5771274900733952 | 6ca8506e  | FAILED
> >  6217757068361728 | 63ed3bc7  | FAILED
> >  5980650261446656 | ae283736  | FAILED
> >  6585898394976256 | 5f13999a  | COMPLETED
> >  4527474786172928 | 7f9acc9b  | COMPLETED
> >  4826100842364928 | e8d4e94a  | COMPLETED
> >  4540563027918848 | b9ee5f2d  | FAILED
> >  6358528648019968 | c5af141c  | FAILED
> >  5998005284765696 | e212a0f8  | COMPLETED
> >  6488580526178304 | b85d5dc0  | FAILED
> >  5034091344560128 | 7dc95cc3  | ABORTED
> >  5688692477526016 | bb048e31  | COMPLETED
> >  5481187977723904 | d351063e  | COMPLETED
> >  5101831568752640 | f30848cb  | COMPLETED <-- the change
> >  6395317408497664 | 3f33b63d  | COMPLETED
> >  6741325208354816 | 877ae5db  | COMPLETED
> >  4594007789010944 | de746e0d  | COMPLETED
> >  6497208998035456 | 461b8cc9  | COMPLETED
>
> The failure rates of this are very high - the majority of the CI runs on the
> postgres/postgres repos failed since the change went in. Which then also means
> cfbot has a very high spurious failure rate. I think we need to revert this
> change until the problem has been verified as fixed.

This specific failure can be reproduced with this patch v1.

I guess the potential race condition is: when
wait_for_replay_catchup() runs WAIT FOR LSN on the standby, if a
tablespace conflict fires during that wait, the WAIT FOR LSN session
is killed even though it doesn't use the tablespace.

In my test, the failure won't occur after applying the v2 patch.

-- 
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-reproduce-the-failure-in-031_recovery_conflict.pl.patch (1.7K, 2-v1-0001-reproduce-the-failure-in-031_recovery_conflict.pl.patch)
  download | inline diff:
From ca73929687f9bf7d4aaa258f8e413ff2c3eea6aa Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Wed, 7 Jan 2026 11:39:41 +0800
Subject: [PATCH v1] reproduce the failure in 031_recovery_conflict.pl

---
 src/test/recovery/t/031_recovery_conflict.pl | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/031_recovery_conflict.pl b/src/test/recovery/t/031_recovery_conflict.pl
index 7a740f69806..39061fcc0a8 100644
--- a/src/test/recovery/t/031_recovery_conflict.pl
+++ b/src/test/recovery/t/031_recovery_conflict.pl
@@ -198,10 +198,25 @@ like($res, qr/^6000$/m,
 	"$sect: cursor with conflicting temp file established");
 
 # Drop the tablespace currently containing spill files for the query on the
-# standby
+# standby.  We pause replay before the DROP, then resume it via a background
+# session.  This forces wait_for_replay_catchup's internal WAIT FOR LSN to be
+# running when the conflict fires, exercising the recovery conflict handling
+# in Cluster.pm.
+$node_standby->safe_psql('postgres', "SELECT pg_wal_replay_pause()");
 $node_primary->safe_psql($test_db, qq[DROP TABLESPACE $tablespace1;]);
 
+# Start a background session that waits 1 second then resumes replay.
+# This triggers the conflict while wait_for_replay_catchup is running.
+my $resume_session = $node_standby->background_psql('postgres');
+$resume_session->query_until(
+	qr/start/, qq[
+	\\echo start
+	SELECT pg_sleep(1);
+	SELECT pg_wal_replay_resume();
+]);
+
 $node_primary->wait_for_replay_catchup($node_standby);
+$resume_session->quit;
 
 check_conflict_log(
 	"User was or might have been using tablespace that must be dropped");
-- 
2.51.0



  [application/octet-stream] v2-0001-Fix-wait_for_catchup-failure-when-standby-session.patch (4.4K, 3-v2-0001-Fix-wait_for_catchup-failure-when-standby-session.patch)
  download | inline diff:
From 1eaf36cbfafb75c91734615529dcc8f0ed7d7999 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 6 Jan 2026 20:55:43 +0800
Subject: [PATCH v2] Fix wait_for_catchup() failure when standby session is
 killed by recovery conflict

Commit f30848cb optimized wait_for_catchup() to use WAIT FOR LSN on
the standby instead of polling pg_stat_replication on the primary.
However, this introduced a failure mode: the WAIT FOR LSN session
can be killed by recovery conflicts on the standby, causing the
test helper to die unexpectedly.

This manifests as flapping failures in tests like 031_recovery_conflict,
where DROP TABLESPACE on the primary triggers
ResolveRecoveryConflictWithTablespace() on the standby. That function
kills all backends indiscriminately, including the innocent WAIT FOR
LSN session that happens to be connected at that moment.

Fix by wrapping the WAIT FOR LSN call in an eval block and falling
back to the original polling approach when the session is killed by
a recovery conflict. The fallback is selective:

- If WAIT FOR LSN succeeds with 'success': return immediately
- If WAIT FOR LSN returns non-success (timeout, not_in_recovery):
  fail immediately with diagnostics
- If the session is killed by a recovery conflict (error contains
  "conflict with recovery"): fall back to polling on the primary
- For any other error: fail immediately to avoid masking real problems

The polling fallback is immune to standby-side conflicts because it
queries pg_stat_replication on the primary, not the standby.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 53 +++++++++++++++++++-----
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index a28ea89aa10..08379aeb8fb 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3401,22 +3401,52 @@ sub wait_for_catchup
 			my $timeout = $PostgreSQL::Test::Utils::timeout_default;
 			my $wait_query =
 			  qq[WAIT FOR LSN '${target_lsn}' WITH (MODE '${wait_mode}', timeout '${timeout}s', no_throw);];
-			my $output = $standby_node->safe_psql('postgres', $wait_query);
-			chomp($output);
 
-			if ($output ne 'success')
+			# Try WAIT FOR LSN. If it succeeds, we're done. If it returns a
+			# non-success status (timeout, not_in_recovery), fail immediately.
+			# If the session is interrupted (e.g., killed by recovery conflict),
+			# fall back to polling on the upstream which is immune to standby-
+			# side conflicts.
+			my $output;
+			local $@;
+			my $wait_succeeded = eval {
+				$output = $standby_node->safe_psql('postgres', $wait_query);
+				chomp($output);
+				1;
+			};
+
+			if ($wait_succeeded && $output eq 'success')
+			{
+				print "done\n";
+				return;
+			}
+
+			# If WAIT FOR LSN executed but returned non-success (e.g., timeout,
+			# not_in_recovery), fail immediately with diagnostic info. Falling
+			# back to polling would just waste time.
+			if ($wait_succeeded)
 			{
-				# Fetch additional detail for debugging purposes
 				my $details = $self->safe_psql('postgres',
 					"SELECT * FROM pg_catalog.pg_stat_replication");
-				diag qq(WAIT FOR LSN failed with status:
-	${output});
-				diag qq(Last pg_stat_replication contents:
-	${details});
-				croak "failed waiting for catchup";
+				diag qq(WAIT FOR LSN returned '$output'
+pg_stat_replication on upstream:
+${details});
+				croak "WAIT FOR LSN '$wait_mode' returned '$output'";
+			}
+
+			# WAIT FOR LSN was interrupted. Only fall back to polling if this
+			# looks like a recovery conflict - the canonical PostgreSQL error
+			# message contains "conflict with recovery". Other errors should
+			# fail immediately rather than being masked by a silent fallback.
+			if ($@ =~ /conflict with recovery/i)
+			{
+				diag qq(WAIT FOR LSN interrupted, falling back to polling:
+$@);
+			}
+			else
+			{
+				croak "WAIT FOR LSN failed: $@";
 			}
-			print "done\n";
-			return;
 		}
 	}
 
@@ -3424,6 +3454,7 @@ sub wait_for_catchup
 	# - 'sent' mode (no corresponding WAIT FOR LSN mode)
 	# - When standby_name is a string (e.g., subscription name)
 	# - When the standby is no longer in recovery (was promoted)
+	# - When WAIT FOR LSN was interrupted (e.g., killed by a recovery conflict)
 	my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
          FROM pg_catalog.pg_stat_replication
          WHERE application_name IN ('$standby_name', 'walreceiver')];
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-07 08:06  Alexander Korotkov <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Alexander Korotkov @ 2026-01-07 08:06 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Thomas Munro <[email protected]>; Xuneng Zhou <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Wed, Jan 7, 2026, 02:32 Andres Freund <[email protected]> wrote:

> Hi,
>
> On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> > Could this be causing the recent flapping failures on CI/macOS in
> > recovery/031_recovery_conflict?  I didn't have time to dig personally
> > but f30848cb looks relevant:
> >
> > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on
> primary
> > error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> > conflict with recovery
> > DETAIL:  User was or might have been using tablespace that must be
> dropped.'
> > while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> > --dbname port=25195
> > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> > no_throw);' at
> /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > line 2300.
> >
> > https://cirrus-ci.com/task/5771274900733952
> >
> > The master branch in time-descending order, macOS tasks only:
> >
> >      task_id      | substring |  status
> > ------------------+-----------+-----------
> >  6460882231754752 | c970bdc0  | FAILED
> >  5771274900733952 | 6ca8506e  | FAILED
> >  6217757068361728 | 63ed3bc7  | FAILED
> >  5980650261446656 | ae283736  | FAILED
> >  6585898394976256 | 5f13999a  | COMPLETED
> >  4527474786172928 | 7f9acc9b  | COMPLETED
> >  4826100842364928 | e8d4e94a  | COMPLETED
> >  4540563027918848 | b9ee5f2d  | FAILED
> >  6358528648019968 | c5af141c  | FAILED
> >  5998005284765696 | e212a0f8  | COMPLETED
> >  6488580526178304 | b85d5dc0  | FAILED
> >  5034091344560128 | 7dc95cc3  | ABORTED
> >  5688692477526016 | bb048e31  | COMPLETED
> >  5481187977723904 | d351063e  | COMPLETED
> >  5101831568752640 | f30848cb  | COMPLETED <-- the change
> >  6395317408497664 | 3f33b63d  | COMPLETED
> >  6741325208354816 | 877ae5db  | COMPLETED
> >  4594007789010944 | de746e0d  | COMPLETED
> >  6497208998035456 | 461b8cc9  | COMPLETED
>
> The failure rates of this are very high - the majority of the CI runs on
> the
> postgres/postgres repos failed since the change went in. Which then also
> means
> cfbot has a very high spurious failure rate. I think we need to revert this
> change until the problem has been verified as fixed.
>

This is fair. I will revert the commit causing the failures in the next few
hours.

------
Regards,
Alexander Korotkov

>


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-08 14:19  Alexander Korotkov <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Korotkov @ 2026-01-08 14:19 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Wed, Jan 7, 2026 at 6:08 AM Xuneng Zhou <[email protected]> wrote:
> On Wed, Jan 7, 2026 at 8:32 AM Andres Freund <[email protected]> wrote:
> > On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> > > Could this be causing the recent flapping failures on CI/macOS in
> > > recovery/031_recovery_conflict?  I didn't have time to dig personally
> > > but f30848cb looks relevant:
> > >
> > > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on primary
> > > error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> > > conflict with recovery
> > > DETAIL:  User was or might have been using tablespace that must be dropped.'
> > > while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> > > --dbname port=25195
> > > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> > > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> > > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> > > no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > > line 2300.
> > >
> > > https://cirrus-ci.com/task/5771274900733952
> > >
> > > The master branch in time-descending order, macOS tasks only:
> > >
> > >      task_id      | substring |  status
> > > ------------------+-----------+-----------
> > >  6460882231754752 | c970bdc0  | FAILED
> > >  5771274900733952 | 6ca8506e  | FAILED
> > >  6217757068361728 | 63ed3bc7  | FAILED
> > >  5980650261446656 | ae283736  | FAILED
> > >  6585898394976256 | 5f13999a  | COMPLETED
> > >  4527474786172928 | 7f9acc9b  | COMPLETED
> > >  4826100842364928 | e8d4e94a  | COMPLETED
> > >  4540563027918848 | b9ee5f2d  | FAILED
> > >  6358528648019968 | c5af141c  | FAILED
> > >  5998005284765696 | e212a0f8  | COMPLETED
> > >  6488580526178304 | b85d5dc0  | FAILED
> > >  5034091344560128 | 7dc95cc3  | ABORTED
> > >  5688692477526016 | bb048e31  | COMPLETED
> > >  5481187977723904 | d351063e  | COMPLETED
> > >  5101831568752640 | f30848cb  | COMPLETED <-- the change
> > >  6395317408497664 | 3f33b63d  | COMPLETED
> > >  6741325208354816 | 877ae5db  | COMPLETED
> > >  4594007789010944 | de746e0d  | COMPLETED
> > >  6497208998035456 | 461b8cc9  | COMPLETED
> >
> > The failure rates of this are very high - the majority of the CI runs on the
> > postgres/postgres repos failed since the change went in. Which then also means
> > cfbot has a very high spurious failure rate. I think we need to revert this
> > change until the problem has been verified as fixed.
>
> This specific failure can be reproduced with this patch v1.
>
> I guess the potential race condition is: when
> wait_for_replay_catchup() runs WAIT FOR LSN on the standby, if a
> tablespace conflict fires during that wait, the WAIT FOR LSN session
> is killed even though it doesn't use the tablespace.
>
> In my test, the failure won't occur after applying the v2 patch.

I see, you were right.  This is not related to the MyProc->xmin.
ResolveRecoveryConflictWithTablespace() calls
GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
would kill WAIT FOR LSN query independently on its xmin.  I guess your
patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
with retry loop, but it would still consume less resources than
polling.

------
Regards,
Alexander Korotkov
Supabase






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-08 16:29  Xuneng Zhou <[email protected]>
  parent: Alexander Korotkov <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-08 16:29 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
>
> On Wed, Jan 7, 2026 at 6:08 AM Xuneng Zhou <[email protected]> wrote:
> > On Wed, Jan 7, 2026 at 8:32 AM Andres Freund <[email protected]> wrote:
> > > On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> > > > Could this be causing the recent flapping failures on CI/macOS in
> > > > recovery/031_recovery_conflict?  I didn't have time to dig personally
> > > > but f30848cb looks relevant:
> > > >
> > > > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on primary
> > > > error running SQL: 'psql:<stdin>:1: ERROR:  canceling statement due to
> > > > conflict with recovery
> > > > DETAIL:  User was or might have been using tablespace that must be dropped.'
> > > > while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> > > > --dbname port=25195
> > > > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> > > > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> > > > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> > > > no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > > > line 2300.
> > > >
> > > > https://cirrus-ci.com/task/5771274900733952
> > > >
> > > > The master branch in time-descending order, macOS tasks only:
> > > >
> > > >      task_id      | substring |  status
> > > > ------------------+-----------+-----------
> > > >  6460882231754752 | c970bdc0  | FAILED
> > > >  5771274900733952 | 6ca8506e  | FAILED
> > > >  6217757068361728 | 63ed3bc7  | FAILED
> > > >  5980650261446656 | ae283736  | FAILED
> > > >  6585898394976256 | 5f13999a  | COMPLETED
> > > >  4527474786172928 | 7f9acc9b  | COMPLETED
> > > >  4826100842364928 | e8d4e94a  | COMPLETED
> > > >  4540563027918848 | b9ee5f2d  | FAILED
> > > >  6358528648019968 | c5af141c  | FAILED
> > > >  5998005284765696 | e212a0f8  | COMPLETED
> > > >  6488580526178304 | b85d5dc0  | FAILED
> > > >  5034091344560128 | 7dc95cc3  | ABORTED
> > > >  5688692477526016 | bb048e31  | COMPLETED
> > > >  5481187977723904 | d351063e  | COMPLETED
> > > >  5101831568752640 | f30848cb  | COMPLETED <-- the change
> > > >  6395317408497664 | 3f33b63d  | COMPLETED
> > > >  6741325208354816 | 877ae5db  | COMPLETED
> > > >  4594007789010944 | de746e0d  | COMPLETED
> > > >  6497208998035456 | 461b8cc9  | COMPLETED
> > >
> > > The failure rates of this are very high - the majority of the CI runs on the
> > > postgres/postgres repos failed since the change went in. Which then also means
> > > cfbot has a very high spurious failure rate. I think we need to revert this
> > > change until the problem has been verified as fixed.
> >
> > This specific failure can be reproduced with this patch v1.
> >
> > I guess the potential race condition is: when
> > wait_for_replay_catchup() runs WAIT FOR LSN on the standby, if a
> > tablespace conflict fires during that wait, the WAIT FOR LSN session
> > is killed even though it doesn't use the tablespace.
> >
> > In my test, the failure won't occur after applying the v2 patch.
>
> I see, you were right.  This is not related to the MyProc->xmin.
> ResolveRecoveryConflictWithTablespace() calls
> GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> would kill WAIT FOR LSN query independently on its xmin.

I think the concern is valid --- conflicts like
PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
backend if the timing is unlucky. It's more difficult to reproduce
though. A check for the log containing "conflict with recovery" would
likely catch these conflicts as well.

> I guess your
> patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> with retry loop, but it would still consume less resources than
> polling.
>

Assuming recovery conflicts are relatively rare in tap tests, except
for the explicitly designed tests like 031_recovery_conflict and the
narrow timing window that the standby has not caught up while the wait
for gets invoked, a simple fallback seems appropriate to me.

-- 
Best,
Xuneng






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-08 20:42  Alexander Korotkov <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Korotkov @ 2026-01-08 20:42 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > I see, you were right.  This is not related to the MyProc->xmin.
> > ResolveRecoveryConflictWithTablespace() calls
> > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > would kill WAIT FOR LSN query independently on its xmin.
>
> I think the concern is valid --- conflicts like
> PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> backend if the timing is unlucky. It's more difficult to reproduce
> though. A check for the log containing "conflict with recovery" would
> likely catch these conflicts as well.

Yes, I found multiple reasons why xmin gets temporarily set during
processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
that.

> > I guess your
> > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > with retry loop, but it would still consume less resources than
> > polling.
> >
>
> Assuming recovery conflicts are relatively rare in tap tests, except
> for the explicitly designed tests like 031_recovery_conflict and the
> narrow timing window that the standby has not caught up while the wait
> for gets invoked, a simple fallback seems appropriate to me.

Yes, I see.  Seems acceptable given this seems the only feasible way to go.

------
Regards,
Alexander Korotkov
Supabase






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-09 13:44  Xuneng Zhou <[email protected]>
  parent: Alexander Korotkov <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-09 13:44 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
>
> On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > I see, you were right.  This is not related to the MyProc->xmin.
> > > ResolveRecoveryConflictWithTablespace() calls
> > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > > would kill WAIT FOR LSN query independently on its xmin.
> >
> > I think the concern is valid --- conflicts like
> > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > backend if the timing is unlucky. It's more difficult to reproduce
> > though. A check for the log containing "conflict with recovery" would
> > likely catch these conflicts as well.
>
> Yes, I found multiple reasons why xmin gets temporarily set during
> processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
> that.
>
> > > I guess your
> > > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > > with retry loop, but it would still consume less resources than
> > > polling.
> > >
> >
> > Assuming recovery conflicts are relatively rare in tap tests, except
> > for the explicitly designed tests like 031_recovery_conflict and the
> > narrow timing window that the standby has not caught up while the wait
> > for gets invoked, a simple fallback seems appropriate to me.
>
> Yes, I see.  Seems acceptable given this seems the only feasible way to go.
>

Here is the updated patch with recovery conflicts handled.

-- 
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch (6.3K, 2-v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch)
  download | inline diff:
From 1b1aa652aff6681e5f43eba4f4690b174052d478 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 9 Jan 2026 21:32:12 +0800
Subject: [PATCH v1] Use WAIT FOR LSN in
 PostgreSQL::Test::Cluster::wait_for_catchup()

When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes.  This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.

The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
- The standby is in recovery

For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), or when the standby has
been promoted, the function falls back to the original polling-based
approach using pg_stat_replication on the upstream.

Additionally, if the WAIT FOR LSN session is killed by a recovery
conflict (e.g., DROP TABLESPACE killing all backends indiscriminately),
the function catches this error and falls back to polling.  This makes
the test infrastructure robust against the timing-dependent conflicts
that can occur in tests like 031_recovery_conflict.

Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 91 +++++++++++++++++++++++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 955dfc0e7f8..87c3d2750cb 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3320,6 +3320,13 @@ If you pass an explicit value of target_lsn, it should almost always be
 the primary's write LSN; so this parameter is seldom needed except when
 querying some intermediate replication node rather than the primary.
 
+When the standby is passed as a PostgreSQL::Test::Cluster instance and is
+in recovery, this function uses the WAIT FOR LSN command on the standby
+for modes replay, write, and flush.  This is more efficient than polling
+pg_stat_replication on the upstream, as WAIT FOR LSN uses a latch-based
+wakeup mechanism.  For 'sent' mode, or when the standby is passed as a
+string (e.g., a subscription name), it falls back to polling.
+
 If there is no active replication connection from this peer, waits until
 poll_query_until timeout.
 
@@ -3339,10 +3346,13 @@ sub wait_for_catchup
 	  . join(', ', keys(%valid_modes))
 	  unless exists($valid_modes{$mode});
 
-	# Allow passing of a PostgreSQL::Test::Cluster instance as shorthand
+	# Keep a reference to the standby node if passed as an object, so we can
+	# use WAIT FOR LSN on it later.
+	my $standby_node;
 	if (blessed($standby_name)
 		&& $standby_name->isa("PostgreSQL::Test::Cluster"))
 	{
+		$standby_node = $standby_name;
 		$standby_name = $standby_name->name;
 	}
 	if (!defined($target_lsn))
@@ -3367,6 +3377,85 @@ sub wait_for_catchup
 	  . $self->name . "\n";
 	# Before release 12 walreceiver just set the application name to
 	# "walreceiver"
+
+	# Use WAIT FOR LSN on the standby when:
+	# - The standby was passed as a Cluster object (so we can connect to it)
+	# - The mode is replay, write, or flush (not 'sent')
+	# - The standby is in recovery
+	# This is more efficient than polling pg_stat_replication on the upstream,
+	# as WAIT FOR LSN uses a latch-based wakeup mechanism.
+	if (defined($standby_node) && ($mode ne 'sent'))
+	{
+		my $standby_in_recovery =
+		  $standby_node->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+		chomp($standby_in_recovery);
+
+		if ($standby_in_recovery eq 't')
+		{
+			# Map mode names to WAIT FOR LSN mode names
+			my %mode_map = (
+				'replay' => 'standby_replay',
+				'write'  => 'standby_write',
+				'flush'  => 'standby_flush',
+			);
+			my $wait_mode = $mode_map{$mode};
+			my $timeout = $PostgreSQL::Test::Utils::timeout_default;
+			my $wait_query =
+			  qq[WAIT FOR LSN '${target_lsn}' WITH (MODE '${wait_mode}', timeout '${timeout}s', no_throw);];
+
+			# Try WAIT FOR LSN. If it succeeds, we're done. If it returns a
+			# non-success status (timeout, not_in_recovery), fail immediately.
+			# If the session is interrupted (e.g., killed by recovery conflict),
+			# fall back to polling on the upstream which is immune to standby-
+			# side conflicts.
+			my $output;
+			local $@;
+			my $wait_succeeded = eval {
+				$output = $standby_node->safe_psql('postgres', $wait_query);
+				chomp($output);
+				1;
+			};
+
+			if ($wait_succeeded && $output eq 'success')
+			{
+				print "done\n";
+				return;
+			}
+
+			# If WAIT FOR LSN executed but returned non-success (e.g., timeout,
+			# not_in_recovery), fail immediately with diagnostic info. Falling
+			# back to polling would just waste time.
+			if ($wait_succeeded)
+			{
+				my $details = $self->safe_psql('postgres',
+					"SELECT * FROM pg_catalog.pg_stat_replication");
+				diag qq(WAIT FOR LSN returned '$output'
+pg_stat_replication on upstream:
+${details});
+				croak "WAIT FOR LSN '$wait_mode' to '$target_lsn' returned '$output'";
+			}
+
+			# WAIT FOR LSN was interrupted. Only fall back to polling if this
+			# looks like a recovery conflict - the canonical PostgreSQL error
+			# message contains "conflict with recovery". Other errors should
+			# fail immediately rather than being masked by a silent fallback.
+			if ($@ =~ /conflict with recovery/i)
+			{
+				diag qq(WAIT FOR LSN interrupted, falling back to polling:
+$@);
+			}
+			else
+			{
+				croak "WAIT FOR LSN failed: $@";
+			}
+		}
+	}
+
+	# Fall back to polling pg_stat_replication on the upstream for:
+	# - 'sent' mode (no corresponding WAIT FOR LSN mode)
+	# - When standby_name is a string (e.g., subscription name)
+	# - When the standby is no longer in recovery (was promoted)
+	# - When WAIT FOR LSN was interrupted (e.g., killed by a recovery conflict)
 	my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
          FROM pg_catalog.pg_stat_replication
          WHERE application_name IN ('$standby_name', 'walreceiver')];
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-10 04:47  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-10 04:47 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Fri, Jan 9, 2026 at 9:44 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi,
>
> On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
> >
> > On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > > I see, you were right.  This is not related to the MyProc->xmin.
> > > > ResolveRecoveryConflictWithTablespace() calls
> > > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > > > would kill WAIT FOR LSN query independently on its xmin.
> > >
> > > I think the concern is valid --- conflicts like
> > > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > > backend if the timing is unlucky. It's more difficult to reproduce
> > > though. A check for the log containing "conflict with recovery" would
> > > likely catch these conflicts as well.
> >
> > Yes, I found multiple reasons why xmin gets temporarily set during
> > processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
> > that.
> >
> > > > I guess your
> > > > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > > > with retry loop, but it would still consume less resources than
> > > > polling.
> > > >
> > >
> > > Assuming recovery conflicts are relatively rare in tap tests, except
> > > for the explicitly designed tests like 031_recovery_conflict and the
> > > narrow timing window that the standby has not caught up while the wait
> > > for gets invoked, a simple fallback seems appropriate to me.
> >
> > Yes, I see.  Seems acceptable given this seems the only feasible way to go.
> >
>
> Here is the updated patch with recovery conflicts handled.

V2 corrected the commit message to state " if the WAIT FOR LSN session
is interrupted by a recovery conflict (e.g., DROP TABLESPACE
triggering conflicts on all backends),".  In this case, the statement
is canceled when possible; in some states (idle in transaction or
subtransaction) the session may be terminated.


-- 
Best,
Xuneng


Attachments:

  [application/octet-stream] v2-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch (6.3K, 2-v2-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch)
  download | inline diff:
From 8d92735d473b974bfe53183615c448792ad209dc Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 9 Jan 2026 21:32:12 +0800
Subject: [PATCH v2] Use WAIT FOR LSN in
 PostgreSQL::Test::Cluster::wait_for_catchup()

When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes.  This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.

The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
- The standby is in recovery

For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), or when the standby has
been promoted, the function falls back to the original polling-based
approach using pg_stat_replication on the upstream.

Additionally, if the WAIT FOR LSN session is interrupted by a recovery
conflict (e.g., DROP TABLESPACE triggering conflicts on all backends),
the function catches this error and falls back to polling.  This makes
the test infrastructure robust against the timing-dependent conflicts
that can occur in tests like 031_recovery_conflict.

Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 91 +++++++++++++++++++++++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 955dfc0e7f8..87c3d2750cb 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3320,6 +3320,13 @@ If you pass an explicit value of target_lsn, it should almost always be
 the primary's write LSN; so this parameter is seldom needed except when
 querying some intermediate replication node rather than the primary.
 
+When the standby is passed as a PostgreSQL::Test::Cluster instance and is
+in recovery, this function uses the WAIT FOR LSN command on the standby
+for modes replay, write, and flush.  This is more efficient than polling
+pg_stat_replication on the upstream, as WAIT FOR LSN uses a latch-based
+wakeup mechanism.  For 'sent' mode, or when the standby is passed as a
+string (e.g., a subscription name), it falls back to polling.
+
 If there is no active replication connection from this peer, waits until
 poll_query_until timeout.
 
@@ -3339,10 +3346,13 @@ sub wait_for_catchup
 	  . join(', ', keys(%valid_modes))
 	  unless exists($valid_modes{$mode});
 
-	# Allow passing of a PostgreSQL::Test::Cluster instance as shorthand
+	# Keep a reference to the standby node if passed as an object, so we can
+	# use WAIT FOR LSN on it later.
+	my $standby_node;
 	if (blessed($standby_name)
 		&& $standby_name->isa("PostgreSQL::Test::Cluster"))
 	{
+		$standby_node = $standby_name;
 		$standby_name = $standby_name->name;
 	}
 	if (!defined($target_lsn))
@@ -3367,6 +3377,85 @@ sub wait_for_catchup
 	  . $self->name . "\n";
 	# Before release 12 walreceiver just set the application name to
 	# "walreceiver"
+
+	# Use WAIT FOR LSN on the standby when:
+	# - The standby was passed as a Cluster object (so we can connect to it)
+	# - The mode is replay, write, or flush (not 'sent')
+	# - The standby is in recovery
+	# This is more efficient than polling pg_stat_replication on the upstream,
+	# as WAIT FOR LSN uses a latch-based wakeup mechanism.
+	if (defined($standby_node) && ($mode ne 'sent'))
+	{
+		my $standby_in_recovery =
+		  $standby_node->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+		chomp($standby_in_recovery);
+
+		if ($standby_in_recovery eq 't')
+		{
+			# Map mode names to WAIT FOR LSN mode names
+			my %mode_map = (
+				'replay' => 'standby_replay',
+				'write'  => 'standby_write',
+				'flush'  => 'standby_flush',
+			);
+			my $wait_mode = $mode_map{$mode};
+			my $timeout = $PostgreSQL::Test::Utils::timeout_default;
+			my $wait_query =
+			  qq[WAIT FOR LSN '${target_lsn}' WITH (MODE '${wait_mode}', timeout '${timeout}s', no_throw);];
+
+			# Try WAIT FOR LSN. If it succeeds, we're done. If it returns a
+			# non-success status (timeout, not_in_recovery), fail immediately.
+			# If the session is interrupted (e.g., killed by recovery conflict),
+			# fall back to polling on the upstream which is immune to standby-
+			# side conflicts.
+			my $output;
+			local $@;
+			my $wait_succeeded = eval {
+				$output = $standby_node->safe_psql('postgres', $wait_query);
+				chomp($output);
+				1;
+			};
+
+			if ($wait_succeeded && $output eq 'success')
+			{
+				print "done\n";
+				return;
+			}
+
+			# If WAIT FOR LSN executed but returned non-success (e.g., timeout,
+			# not_in_recovery), fail immediately with diagnostic info. Falling
+			# back to polling would just waste time.
+			if ($wait_succeeded)
+			{
+				my $details = $self->safe_psql('postgres',
+					"SELECT * FROM pg_catalog.pg_stat_replication");
+				diag qq(WAIT FOR LSN returned '$output'
+pg_stat_replication on upstream:
+${details});
+				croak "WAIT FOR LSN '$wait_mode' to '$target_lsn' returned '$output'";
+			}
+
+			# WAIT FOR LSN was interrupted. Only fall back to polling if this
+			# looks like a recovery conflict - the canonical PostgreSQL error
+			# message contains "conflict with recovery". Other errors should
+			# fail immediately rather than being masked by a silent fallback.
+			if ($@ =~ /conflict with recovery/i)
+			{
+				diag qq(WAIT FOR LSN interrupted, falling back to polling:
+$@);
+			}
+			else
+			{
+				croak "WAIT FOR LSN failed: $@";
+			}
+		}
+	}
+
+	# Fall back to polling pg_stat_replication on the upstream for:
+	# - 'sent' mode (no corresponding WAIT FOR LSN mode)
+	# - When standby_name is a string (e.g., subscription name)
+	# - When the standby is no longer in recovery (was promoted)
+	# - When WAIT FOR LSN was interrupted (e.g., killed by a recovery conflict)
 	my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
          FROM pg_catalog.pg_stat_replication
          WHERE application_name IN ('$standby_name', 'walreceiver')];
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-12 06:53  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-12 06:53 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi Alexander,

On Sat, Jan 10, 2026 at 12:47 PM Xuneng Zhou <[email protected]> wrote:
>
> On Fri, Jan 9, 2026 at 9:44 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
> > >
> > > On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > > > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > > > I see, you were right.  This is not related to the MyProc->xmin.
> > > > > ResolveRecoveryConflictWithTablespace() calls
> > > > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > > > > would kill WAIT FOR LSN query independently on its xmin.
> > > >
> > > > I think the concern is valid --- conflicts like
> > > > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > > > backend if the timing is unlucky. It's more difficult to reproduce
> > > > though. A check for the log containing "conflict with recovery" would
> > > > likely catch these conflicts as well.
> > >
> > > Yes, I found multiple reasons why xmin gets temporarily set during
> > > processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
> > > that.
> > >
> > > > > I guess your
> > > > > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > > > > with retry loop, but it would still consume less resources than
> > > > > polling.
> > > > >
> > > >
> > > > Assuming recovery conflicts are relatively rare in tap tests, except
> > > > for the explicitly designed tests like 031_recovery_conflict and the
> > > > narrow timing window that the standby has not caught up while the wait
> > > > for gets invoked, a simple fallback seems appropriate to me.
> > >
> > > Yes, I see.  Seems acceptable given this seems the only feasible way to go.
> > >
> >
> > Here is the updated patch with recovery conflicts handled.
>
> V2 corrected the commit message to state " if the WAIT FOR LSN session
> is interrupted by a recovery conflict (e.g., DROP TABLESPACE
> triggering conflicts on all backends),".  In this case, the statement
> is canceled when possible; in some states (idle in transaction or
> subtransaction) the session may be terminated.
>

The attached patch avoids a syscache lookup while constructing the
tuple descriptor for WAIT FOR LSN, so that a catalog snapshot is not
re-established after the wait finishes.

The standard output path (printtup) may still briefly establish a
catalog snapshot during result emission, but this seems acceptable:
the snapshot window is narrow to emit a single row. A fully
catalog-free output path would require either bypassing the
DestReceiver lifecycle (breaking layering) or adding a custom receiver
(added complexity for marginal benefit). The current approach is
simpler and might be sufficient unless output-phase conflicts are
observed a lot in practice. Does this make sense to you?

--
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch (2.4K, 2-v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch)
  download | inline diff:
From 50beec4de4e078020b03667453458cd440c26267 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 12 Jan 2026 14:36:27 +0800
Subject: [PATCH v1] Avoid syscache lookup in WAIT FOR LSN tuple descriptor

Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when
building the result tuple descriptor for WAIT FOR LSN. This avoids
a syscache access that could re-establish a catalog snapshot after
we've explicitly released all snapshots before the wait.
---
 src/backend/commands/wait.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
index 97f1e778488..191c1877125 100644
--- a/src/backend/commands/wait.c
+++ b/src/backend/commands/wait.c
@@ -15,6 +15,7 @@
 
 #include <math.h>
 
+#include "access/tupdesc.h"
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogwait.h"
@@ -320,7 +321,17 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
 			break;
 	}
 
-	/* need a tuple descriptor representing a single TEXT column */
+	/*
+	 * Output the result.
+	 *
+	 * We use TupleDescInitBuiltinEntry in WaitStmtResultDesc to avoid
+	 * syscache access when building the tuple descriptor. The standard output
+	 * path may briefly establish a catalog snapshot during output, but this
+	 * is acceptable since: 1. The snapshot window is very brief (just
+	 * emitting one row) 2. The critical section (the wait itself) is already
+	 * snapshot-free 3. Using the standard path respects receiver lifecycle
+	 * and semantics
+	 */
 	tupdesc = WaitStmtResultDesc(stmt);
 
 	/* prepare for projection of tuples */
@@ -337,9 +348,16 @@ WaitStmtResultDesc(WaitStmt *stmt)
 {
 	TupleDesc	tupdesc;
 
-	/* Need a tuple descriptor representing a single TEXT  column */
+	/*
+	 * Need a tuple descriptor representing a single TEXT column.
+	 *
+	 * We use TupleDescInitBuiltinEntry instead of TupleDescInitEntry to avoid
+	 * syscache access. This is important because WaitStmtResultDesc may be
+	 * called after snapshots have been released, and we must not re-establish
+	 * a catalog snapshot which could cause recovery conflicts on a standby.
+	 */
 	tupdesc = CreateTemplateTupleDesc(1);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-					   TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "status",
+							  TEXTOID, -1, 0);
 	return tupdesc;
 }
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-20 01:28  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-20 01:28 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

Peter pointed out redundant pg_unreachable() calls after elog(ERROR)
in wait.c. Attached patch removes them.

--
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-WAIT-FOR-Remove-redundant-pg_unreachable-after-el.patch (1.3K, 2-v1-0001-WAIT-FOR-Remove-redundant-pg_unreachable-after-el.patch)
  download | inline diff:
From b7e3036d1872b96303f7defdad2f6f9332e89f8e Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 20 Jan 2026 08:40:24 +0800
Subject: [PATCH v1] WAIT FOR: Remove redundant pg_unreachable() after
 elog(ERROR)

elog(ERROR) never returns, so the following pg_unreachable() calls
are unnecessary. This pattern is useful for non-void functions, but
these cases are in a void function.
---
 src/backend/commands/wait.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
index 54f2df2425f..212a0407871 100644
--- a/src/backend/commands/wait.c
+++ b/src/backend/commands/wait.c
@@ -236,7 +236,6 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
 
 					default:
 						elog(ERROR, "unexpected wait LSN type %d", lsnType);
-						pg_unreachable();
 				}
 			}
 			else
@@ -281,7 +280,6 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
 
 						default:
 							elog(ERROR, "unexpected wait LSN type %d", lsnType);
-							pg_unreachable();
 					}
 				}
 				else
@@ -311,7 +309,6 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
 
 						default:
 							elog(ERROR, "unexpected wait LSN type %d", lsnType);
-							pg_unreachable();
 					}
 				}
 			}
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-27 01:14  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-01-27 01:14 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi Alexander,

Heikki spotted a misplaced wake-up call for replay waiters in
PerformWalRecovery. He suggested that the WaitLSNWakeup needs to be
invoked immediately after wal record is applied to avoid the potential
missed wake-ups when recovery stops/pauses/promotes. It makes sense to
me. Please check the attached patch to fix that.

-- 
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-Wake-LSN-waiters-before-recovery-target-stop.patch (1.6K, 2-v1-0001-Wake-LSN-waiters-before-recovery-target-stop.patch)
  download | inline diff:
From 3e692c636838a928918266174c3e139274384702 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 27 Jan 2026 09:00:39 +0800
Subject: [PATCH v1] Wake LSN waiters before recovery target stop

  Move WaitLSNWakeup() immediately after ApplyWalRecord() so waiters are
  signaled even when recoveryStopsAfter() breaks out for pause/promotion
  targets.
---
 src/backend/access/transam/xlogrecovery.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index b9393e551b7..36daa3d8587 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1777,13 +1777,6 @@ PerformWalRecovery(void)
 			 */
 			ApplyWalRecord(xlogreader, record, &replayTLI);
 
-			/* Exit loop if we reached inclusive recovery target */
-			if (recoveryStopsAfter(xlogreader))
-			{
-				reachedRecoveryTarget = true;
-				break;
-			}
-
 			/*
 			 * If we replayed an LSN that someone was waiting for then walk
 			 * over the shared memory array and set latches to notify the
@@ -1794,6 +1787,13 @@ PerformWalRecovery(void)
 				 pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_REPLAY])))
 				WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_REPLAY, XLogRecoveryCtl->lastReplayedEndRecPtr);
 
+			/* Exit loop if we reached inclusive recovery target */
+			if (recoveryStopsAfter(xlogreader))
+			{
+				reachedRecoveryTarget = true;
+				break;
+			}
+
 			/* Else, try to fetch the next WAL record */
 			record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
 		} while (record != NULL);
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-01-29 07:47  Alexander Korotkov <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Korotkov @ 2026-01-29 07:47 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Tue, Jan 27, 2026 at 3:14 AM Xuneng Zhou <[email protected]> wrote:
> Heikki spotted a misplaced wake-up call for replay waiters in
> PerformWalRecovery. He suggested that the WaitLSNWakeup needs to be
> invoked immediately after wal record is applied to avoid the potential
> missed wake-ups when recovery stops/pauses/promotes. It makes sense to
> me. Please check the attached patch to fix that.

Pushed, thank you!

------
Regards,
Alexander Korotkov
Supabase






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-19 20:00  Alexander Lakhin <[email protected]>
  parent: Alexander Korotkov <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Lakhin @ 2026-05-19 20:00 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; Xuneng Zhou <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hello Alexander and Xuneng,

06.04.2026 22:49, Alexander Korotkov wrote:
> Thank you, I've pushed your version of patchset.  I made two minor
> corrections for patch #2: mention default mode value in the header
> comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.

I discovered a new test failure, that is apparently caused by new
wait_for_catchup() implementation [1]:
[06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
[06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
#   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
[06:20:23.111](0.000s) #          got: 'unreserved'
#     expected: 'extended'
[06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
[06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
#   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
[06:20:23.231](0.000s) #          got: 'lost|'
#     expected: 'unreserved|t'

I've managed to reproduce such failures with:
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce..493ce92674e 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)

+pg_usleep(10000);
      /* Get current timestamp. */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 04aa770d981..19cda3a6b51 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)

+pg_usleep(100000);
      /* the caller already consumed the msgtype byte */

Concretely, a loop:
for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
failed for me on iterations 2, 1, 7:
ITERATION 7
# +++ tap check in src/test/recovery +++
t/019_replslot_limit.pl .. 8/?
#   Failed test 'check that the slot state changes to "extended"'
#   at t/019_replslot_limit.pl line 140.
#          got: 'unreserved'
#     expected: 'extended'
t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/26 subtests

With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
passed.

Having extra logging added, I could see the key difference.
Failed run:
2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 
0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM 
pg_replication_slots WHERE slot_name = 'rep1'
vs
Successful run:
2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 
0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM 
pg_replication_slots WHERE slot_name = 'rep1'

That is, with WAIT FOR LSN, primary in this test may advance
slot->data.restart_lsn to the expected position after wait_for_catchup()
returns.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03

Best regards,
Alexander

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-20 03:30  Xuneng Zhou <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-05-20 03:30 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi,

On Tue, May 19, 2026 at 1:00 PM Alexander Lakhin <[email protected]> wrote:
>
> Hello Alexander and Xuneng,
>
> 06.04.2026 22:49, Alexander Korotkov wrote:
>
> Thank you, I've pushed your version of patchset.  I made two minor
> corrections for patch #2: mention default mode value in the header
> comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
>
>
> I discovered a new test failure, that is apparently caused by new
> wait_for_catchup() implementation [1]:
> [06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
> [06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
> #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
> [06:20:23.111](0.000s) #          got: 'unreserved'
> #     expected: 'extended'
> [06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
> [06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
> #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
> [06:20:23.231](0.000s) #          got: 'lost|'
> #     expected: 'unreserved|t'
>
> I've managed to reproduce such failures with:
> diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> index 07eac07b9ce..493ce92674e 100644
> --- a/src/backend/replication/walreceiver.c
> +++ b/src/backend/replication/walreceiver.c
> @@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
>
> +pg_usleep(10000);
>      /* Get current timestamp. */
> diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> index 04aa770d981..19cda3a6b51 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
>
> +pg_usleep(100000);
>      /* the caller already consumed the msgtype byte */
>
> Concretely, a loop:
> for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
> failed for me on iterations 2, 1, 7:
> ITERATION 7
> # +++ tap check in src/test/recovery +++
> t/019_replslot_limit.pl .. 8/?
> #   Failed test 'check that the slot state changes to "extended"'
> #   at t/019_replslot_limit.pl line 140.
> #          got: 'unreserved'
> #     expected: 'extended'
> t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
> t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/26 subtests
>
> With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
> passed.
>
> Having extra logging added, I could see the key difference.
> Failed run:
> 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
> 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> vs
> Successful run:
> 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
> 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
>
> That is, with WAIT FOR LSN, primary in this test may advance
> slot->data.restart_lsn to the expected position after wait_for_catchup()
> returns.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03

Thanks for reporting this issue.

I think this is related to the semantic change made earlier:
wait_for_catchup() now returns once the standby itself has reached the
target LSN, rather than waiting until the primary observes that
position via pg_stat_replication.

As a result, the primary may not yet have processed the standby
feedback needed to advance the slot's restart_lsn when
wait_for_catchup() returns.

Actually, I was aware of this semantic change and previously thought
it might be harmless. But this failure appears to disprove that. I'll
prepare a patch to fix this shortly.

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-20 05:18  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-05-20 05:18 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Alexander Korotkov <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Tue, May 19, 2026 at 8:30 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi,
>
> On Tue, May 19, 2026 at 1:00 PM Alexander Lakhin <[email protected]> wrote:
> >
> > Hello Alexander and Xuneng,
> >
> > 06.04.2026 22:49, Alexander Korotkov wrote:
> >
> > Thank you, I've pushed your version of patchset.  I made two minor
> > corrections for patch #2: mention default mode value in the header
> > comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
> >
> >
> > I discovered a new test failure, that is apparently caused by new
> > wait_for_catchup() implementation [1]:
> > [06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
> > [06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
> > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
> > [06:20:23.111](0.000s) #          got: 'unreserved'
> > #     expected: 'extended'
> > [06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
> > [06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
> > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
> > [06:20:23.231](0.000s) #          got: 'lost|'
> > #     expected: 'unreserved|t'
> >
> > I've managed to reproduce such failures with:
> > diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> > index 07eac07b9ce..493ce92674e 100644
> > --- a/src/backend/replication/walreceiver.c
> > +++ b/src/backend/replication/walreceiver.c
> > @@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
> >
> > +pg_usleep(10000);
> >      /* Get current timestamp. */
> > diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> > index 04aa770d981..19cda3a6b51 100644
> > --- a/src/backend/replication/walsender.c
> > +++ b/src/backend/replication/walsender.c
> > @@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
> >
> > +pg_usleep(100000);
> >      /* the caller already consumed the msgtype byte */
> >
> > Concretely, a loop:
> > for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
> > failed for me on iterations 2, 1, 7:
> > ITERATION 7
> > # +++ tap check in src/test/recovery +++
> > t/019_replslot_limit.pl .. 8/?
> > #   Failed test 'check that the slot state changes to "extended"'
> > #   at t/019_replslot_limit.pl line 140.
> > #          got: 'unreserved'
> > #     expected: 'extended'
> > t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
> > t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> > Failed 1/26 subtests
> >
> > With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
> > passed.
> >
> > Having extra logging added, I could see the key difference.
> > Failed run:
> > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
> > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > vs
> > Successful run:
> > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
> > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> >
> > That is, with WAIT FOR LSN, primary in this test may advance
> > slot->data.restart_lsn to the expected position after wait_for_catchup()
> > returns.
> >
> > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03
>
> Thanks for reporting this issue.
>
> I think this is related to the semantic change made earlier:
> wait_for_catchup() now returns once the standby itself has reached the
> target LSN, rather than waiting until the primary observes that
> position via pg_stat_replication.
>
> As a result, the primary may not yet have processed the standby
> feedback needed to advance the slot's restart_lsn when
> wait_for_catchup() returns.
>
> Actually, I was aware of this semantic change and previously thought
> it might be harmless. But this failure appears to disprove that. I'll
> prepare a patch to fix this shortly.

After some consideration, 019_replslot_limit.pl appears to the better
place to place the fix rather than by restoring the old primary-side
polling barrier in wait_for_catchup().

The new wait_for_catchup() behavior is closer to its natural
semantics: for replay/write/flush modes, it waits until the standby
itself has reached the requested LSN. The old implementation used
pg_stat_replication on the primary, which also implied that the
primary had received and processed standby feedback. That was a useful
side effect for this test, but it is not required by most callers.

019_replslot_limit.pl is different because it checks primary-side slot
state. For a physical slot, restart_lsn advances only after the
primary's walsender processes standby feedback. So the test needs an
extra condition beyond ordinary standby catchup.

The patch makes that dependency explicit: wait for the standby to
replay the target LSN, then wait for the slot's restart_lsn on the
primary to pass the same LSN. This keeps wait_for_catchup() focused on
standby catchup while making the slot-specific synchronization visible
in the test that needs it.


-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Attachments:

  [application/octet-stream] 0001-Stabilize-replslot-limit-test-after-standby-catchup.patch (3.2K, 2-0001-Stabilize-replslot-limit-test-after-standby-catchup.patch)
  download | inline diff:
From 41eb480a5c52c3ebcee4488e38bef5b8aed59ff5 Mon Sep 17 00:00:00 2001
From: Xuneng Zhou <[email protected]>
Date: Tue, 19 May 2026 22:11:04 -0700
Subject: [PATCH] Stabilize replslot limit test after standby catchup

019_replslot_limit.pl checks primary-side WAL availability for a
physical replication slot after stopping and restarting a standby.
wait_for_catchup() now waits for the standby to reach the target LSN
locally, so it can return before the primary has processed standby
feedback and advanced the slot's restart_lsn.

Make the test wait explicitly for that slot advancement before stopping
the standby.
---
 src/test/recovery/t/019_replslot_limit.pl | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7b253e64d9c..0c43acdf8a7 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -12,6 +12,16 @@ use PostgreSQL::Test::Cluster;
 use Test::More;
 use Time::HiRes qw(usleep);
 
+sub wait_for_standby_and_slot_catchup
+{
+	my ($primary, $standby, $slot_name) = @_;
+
+	my $target_lsn = $primary->lsn('write');
+
+	$primary->wait_for_catchup($standby, 'replay', $target_lsn);
+	$primary->wait_for_slot_catchup($slot_name, 'restart', $target_lsn);
+}
+
 # Initialize primary node, setting wal-segsize to 1MB
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 $node_primary->init(allows_streaming => 1, extra => ['--wal-segsize=1']);
@@ -44,8 +54,9 @@ $node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'");
 
 $node_standby->start;
 
-# Wait until standby has replayed enough data
-$node_primary->wait_for_catchup($node_standby);
+# Wait until the standby has replayed enough data, and the primary has
+# processed feedback advancing the slot's restart_lsn.
+wait_for_standby_and_slot_catchup($node_primary, $node_standby, 'rep1');
 
 # Stop standby
 $node_standby->stop;
@@ -79,7 +90,7 @@ is($result, "reserved|t", 'check that slot is working');
 # The standby can reconnect to primary
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+wait_for_standby_and_slot_catchup($node_primary, $node_standby, 'rep1');
 
 $node_standby->stop;
 
@@ -109,7 +120,7 @@ is($result, "reserved",
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+wait_for_standby_and_slot_catchup($node_primary, $node_standby, 'rep1');
 $node_standby->stop;
 
 # wal_keep_size overrides max_slot_wal_keep_size
@@ -128,7 +139,7 @@ $result = $node_primary->safe_psql('postgres',
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+wait_for_standby_and_slot_catchup($node_primary, $node_standby, 'rep1');
 $node_standby->stop;
 
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
@@ -155,7 +166,7 @@ is($result, "unreserved|t",
 # The standby still can connect to primary before a checkpoint
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+wait_for_standby_and_slot_catchup($node_primary, $node_standby, 'rep1');
 
 $node_standby->stop;
 
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-20 12:16  Alexander Korotkov <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Korotkov @ 2026-05-20 12:16 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi, Xuneng!

On Wed, May 20, 2026 at 8:18 AM Xuneng Zhou <[email protected]> wrote:
> On Tue, May 19, 2026 at 8:30 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Tue, May 19, 2026 at 1:00 PM Alexander Lakhin <[email protected]> wrote:
> > >
> > > Hello Alexander and Xuneng,
> > >
> > > 06.04.2026 22:49, Alexander Korotkov wrote:
> > >
> > > Thank you, I've pushed your version of patchset.  I made two minor
> > > corrections for patch #2: mention default mode value in the header
> > > comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
> > >
> > >
> > > I discovered a new test failure, that is apparently caused by new
> > > wait_for_catchup() implementation [1]:
> > > [06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
> > > [06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
> > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
> > > [06:20:23.111](0.000s) #          got: 'unreserved'
> > > #     expected: 'extended'
> > > [06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
> > > [06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
> > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
> > > [06:20:23.231](0.000s) #          got: 'lost|'
> > > #     expected: 'unreserved|t'
> > >
> > > I've managed to reproduce such failures with:
> > > diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> > > index 07eac07b9ce..493ce92674e 100644
> > > --- a/src/backend/replication/walreceiver.c
> > > +++ b/src/backend/replication/walreceiver.c
> > > @@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
> > >
> > > +pg_usleep(10000);
> > >      /* Get current timestamp. */
> > > diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> > > index 04aa770d981..19cda3a6b51 100644
> > > --- a/src/backend/replication/walsender.c
> > > +++ b/src/backend/replication/walsender.c
> > > @@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
> > >
> > > +pg_usleep(100000);
> > >      /* the caller already consumed the msgtype byte */
> > >
> > > Concretely, a loop:
> > > for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
> > > failed for me on iterations 2, 1, 7:
> > > ITERATION 7
> > > # +++ tap check in src/test/recovery +++
> > > t/019_replslot_limit.pl .. 8/?
> > > #   Failed test 'check that the slot state changes to "extended"'
> > > #   at t/019_replslot_limit.pl line 140.
> > > #          got: 'unreserved'
> > > #     expected: 'extended'
> > > t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
> > > t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> > > Failed 1/26 subtests
> > >
> > > With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
> > > passed.
> > >
> > > Having extra logging added, I could see the key difference.
> > > Failed run:
> > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
> > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > > vs
> > > Successful run:
> > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
> > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > >
> > > That is, with WAIT FOR LSN, primary in this test may advance
> > > slot->data.restart_lsn to the expected position after wait_for_catchup()
> > > returns.
> > >
> > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03
> >
> > Thanks for reporting this issue.
> >
> > I think this is related to the semantic change made earlier:
> > wait_for_catchup() now returns once the standby itself has reached the
> > target LSN, rather than waiting until the primary observes that
> > position via pg_stat_replication.
> >
> > As a result, the primary may not yet have processed the standby
> > feedback needed to advance the slot's restart_lsn when
> > wait_for_catchup() returns.
> >
> > Actually, I was aware of this semantic change and previously thought
> > it might be harmless. But this failure appears to disprove that. I'll
> > prepare a patch to fix this shortly.
>
> After some consideration, 019_replslot_limit.pl appears to the better
> place to place the fix rather than by restoring the old primary-side
> polling barrier in wait_for_catchup().
>
> The new wait_for_catchup() behavior is closer to its natural
> semantics: for replay/write/flush modes, it waits until the standby
> itself has reached the requested LSN. The old implementation used
> pg_stat_replication on the primary, which also implied that the
> primary had received and processed standby feedback. That was a useful
> side effect for this test, but it is not required by most callers.
>
> 019_replslot_limit.pl is different because it checks primary-side slot
> state. For a physical slot, restart_lsn advances only after the
> primary's walsender processes standby feedback. So the test needs an
> extra condition beyond ordinary standby catchup.
>
> The patch makes that dependency explicit: wait for the standby to
> replay the target LSN, then wait for the slot's restart_lsn on the
> primary to pass the same LSN. This keeps wait_for_catchup() focused on
> standby catchup while making the slot-specific synchronization visible
> in the test that needs it.

I agree with you.  But do we actually need a
wait_for_standby_and_slot_catchup() wrapper.  I think we can call
$node->wait_for_slot_catchup() directly and simplify the fix.  Check
the attached patch.

------
Regards,
Alexander Korotkov
Supabase


Attachments:

  [application/octet-stream] v2-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch (3.7K, 2-v2-0001-Stabilize-019_replslot_limit.pl-after-wait_for_ca.patch)
  download | inline diff:
From 62335545e68ac14335467d3cfe5f4b3fdd2d5d25 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <[email protected]>
Date: Wed, 20 May 2026 15:12:37 +0300
Subject: [PATCH v2] Stabilize 019_replslot_limit.pl after wait_for_catchup()
 semantic change

wait_for_catchup() now returns as soon as the standby has replayed the
target LSN locally, rather than waiting until the primary observes that
position via pg_stat_replication.  019_replslot_limit.pl, however,
checks primary-side pg_replication_slots state, which depends on the
slot's restart_lsn -- and restart_lsn advances only after the primary's
walsender processes a standby reply.  The previous polling
wait_for_catchup() implicitly waited for that round trip; the
WAIT FOR LSN-based one does not, so the subtests "check that the slot
state changes to 'extended' / 'unreserved'" become flappy (reproducible
with small artificial delays in XLogWalRcvSendReply /
ProcessStandbyReplyMessage).

Replace each wait_for_catchup() in this test with
wait_for_slot_catchup('rep1', 'restart', primary->lsn('write')).
restart_lsn cannot move ahead of the standby's replayed position, so
this single wait transitively covers both the standby replay and the
primary's observation of it, which is exactly the precondition the
slot-state assertions require.

Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Author: Xuneng Zhou <[email protected]>
Author: Alexander Korotkov <[email protected]>
---
 src/test/recovery/t/019_replslot_limit.pl | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 7b253e64d9c..472aa07587f 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -44,8 +44,12 @@ $node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'");
 
 $node_standby->start;
 
-# Wait until standby has replayed enough data
-$node_primary->wait_for_catchup($node_standby);
+# Wait until the primary has processed standby feedback and advanced
+# the slot's restart_lsn.  restart_lsn moves only after the standby's
+# reply reaches the walsender, so this transitively guarantees that
+# the standby itself has replayed past the target LSN.
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 # Stop standby
 $node_standby->stop;
@@ -79,7 +83,8 @@ is($result, "reserved|t", 'check that slot is working');
 # The standby can reconnect to primary
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
@@ -109,7 +114,8 @@ is($result, "reserved",
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # wal_keep_size overrides max_slot_wal_keep_size
@@ -128,7 +134,8 @@ $result = $node_primary->safe_psql('postgres',
 
 # The standby can reconnect to primary
 $node_standby->start;
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 $node_standby->stop;
 
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
@@ -155,7 +162,8 @@ is($result, "unreserved|t",
 # The standby still can connect to primary before a checkpoint
 $node_standby->start;
 
-$node_primary->wait_for_catchup($node_standby);
+$node_primary->wait_for_slot_catchup('rep1', 'restart',
+	$node_primary->lsn('write'));
 
 $node_standby->stop;
 
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-23 17:09  Xuneng Zhou <[email protected]>
  parent: Alexander Korotkov <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-05-23 17:09 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

Hi Alexander,

On Wed, May 20, 2026 at 5:17 AM Alexander Korotkov <[email protected]> wrote:
>
> Hi, Xuneng!
>
> On Wed, May 20, 2026 at 8:18 AM Xuneng Zhou <[email protected]> wrote:
> > On Tue, May 19, 2026 at 8:30 PM Xuneng Zhou <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, May 19, 2026 at 1:00 PM Alexander Lakhin <[email protected]> wrote:
> > > >
> > > > Hello Alexander and Xuneng,
> > > >
> > > > 06.04.2026 22:49, Alexander Korotkov wrote:
> > > >
> > > > Thank you, I've pushed your version of patchset.  I made two minor
> > > > corrections for patch #2: mention default mode value in the header
> > > > comment, and fallback to polling on has_wal_read_bug sparc64+ext4 bug.
> > > >
> > > >
> > > > I discovered a new test failure, that is apparently caused by new
> > > > wait_for_catchup() implementation [1]:
> > > > [06:20:23.110](1.069s) not ok 8 - check that the slot state changes to "extended"
> > > > [06:20:23.110](0.001s) #   Failed test 'check that the slot state changes to "extended"'
> > > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 140.
> > > > [06:20:23.111](0.000s) #          got: 'unreserved'
> > > > #     expected: 'extended'
> > > > [06:20:23.231](0.120s) not ok 9 - check that the slot state changes to "unreserved"
> > > > [06:20:23.231](0.000s) #   Failed test 'check that the slot state changes to "unreserved"'
> > > > #   at /Users/ec2-user/bf/goldfish/HEAD/pgsql/src/test/recovery/t/019_replslot_limit.pl line 152.
> > > > [06:20:23.231](0.000s) #          got: 'lost|'
> > > > #     expected: 'unreserved|t'
> > > >
> > > > I've managed to reproduce such failures with:
> > > > diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
> > > > index 07eac07b9ce..493ce92674e 100644
> > > > --- a/src/backend/replication/walreceiver.c
> > > > +++ b/src/backend/replication/walreceiver.c
> > > > @@ -1143,2 +1143,3 @@ XLogWalRcvSendReply(bool force, bool requestReply, bool checkApply)
> > > >
> > > > +pg_usleep(10000);
> > > >      /* Get current timestamp. */
> > > > diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> > > > index 04aa770d981..19cda3a6b51 100644
> > > > --- a/src/backend/replication/walsender.c
> > > > +++ b/src/backend/replication/walsender.c
> > > > @@ -2521,2 +2521,3 @@ ProcessStandbyReplyMessage(void)
> > > >
> > > > +pg_usleep(100000);
> > > >      /* the caller already consumed the msgtype byte */
> > > >
> > > > Concretely, a loop:
> > > > for i in {1..100}; do echo "ITERATION $i"; PROVE_TESTS="t/019*" make -s check -C src/test/recovery/ || break; done
> > > > failed for me on iterations 2, 1, 7:
> > > > ITERATION 7
> > > > # +++ tap check in src/test/recovery +++
> > > > t/019_replslot_limit.pl .. 8/?
> > > > #   Failed test 'check that the slot state changes to "extended"'
> > > > #   at t/019_replslot_limit.pl line 140.
> > > > #          got: 'unreserved'
> > > > #     expected: 'extended'
> > > > t/019_replslot_limit.pl .. 26/? # Looks like you failed 1 test of 26.
> > > > t/019_replslot_limit.pl .. Dubious, test returned 1 (wstat 256, 0x100)
> > > > Failed 1/26 subtests
> > > >
> > > > With "WAIT FOR LSN" in wait_for_catchup() disabled, 100 iterations
> > > > passed.
> > > >
> > > > Having extra logging added, I could see the key difference.
> > > > Failed run:
> > > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/016C0000, targetSeg: 22, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 22
> > > > 2026-05-19 22:01:37.968 EEST client backend[3632148] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > > > vs
> > > > Successful run:
> > > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl LOG:  !!!GetWALAvailability| targetLSN: 0/01700000, targetSeg: 23, oldestSlotSeg: 23, oldestSegMaxWalSize: 24, oldestSeg: 23
> > > > 2026-05-19 22:04:18.102 EEST client backend[3633761] 019_replslot_limit.pl STATEMENT:  SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'
> > > >
> > > > That is, with WAIT FOR LSN, primary in this test may advance
> > > > slot->data.restart_lsn to the expected position after wait_for_catchup()
> > > > returns.
> > > >
> > > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=goldfish&dt=2026-05-13%2006%3A15%3A03
> > >
> > > Thanks for reporting this issue.
> > >
> > > I think this is related to the semantic change made earlier:
> > > wait_for_catchup() now returns once the standby itself has reached the
> > > target LSN, rather than waiting until the primary observes that
> > > position via pg_stat_replication.
> > >
> > > As a result, the primary may not yet have processed the standby
> > > feedback needed to advance the slot's restart_lsn when
> > > wait_for_catchup() returns.
> > >
> > > Actually, I was aware of this semantic change and previously thought
> > > it might be harmless. But this failure appears to disprove that. I'll
> > > prepare a patch to fix this shortly.
> >
> > After some consideration, 019_replslot_limit.pl appears to the better
> > place to place the fix rather than by restoring the old primary-side
> > polling barrier in wait_for_catchup().
> >
> > The new wait_for_catchup() behavior is closer to its natural
> > semantics: for replay/write/flush modes, it waits until the standby
> > itself has reached the requested LSN. The old implementation used
> > pg_stat_replication on the primary, which also implied that the
> > primary had received and processed standby feedback. That was a useful
> > side effect for this test, but it is not required by most callers.
> >
> > 019_replslot_limit.pl is different because it checks primary-side slot
> > state. For a physical slot, restart_lsn advances only after the
> > primary's walsender processes standby feedback. So the test needs an
> > extra condition beyond ordinary standby catchup.
> >
> > The patch makes that dependency explicit: wait for the standby to
> > replay the target LSN, then wait for the slot's restart_lsn on the
> > primary to pass the same LSN. This keeps wait_for_catchup() focused on
> > standby catchup while making the slot-specific synchronization visible
> > in the test that needs it.
>
> I agree with you.  But do we actually need a
> wait_for_standby_and_slot_catchup() wrapper.  I think we can call
> $node->wait_for_slot_catchup() directly and simplify the fix.  Check
> the attached patch.
>

The patch looks good to me. I agree that the wait_for_slot_catchup is
not needed and could be misleading. This change would make the exact
synchronization point and its intention clearer. The only price we
need to pay here is bringing back the polling. But it seems acceptable
since the cost was there in the pre-wait-for-lsn era. And thanks for
writing the great commit message!

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-23 18:40  Xuneng Zhou <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Xuneng Zhou @ 2026-05-23 18:40 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

>
>
> > I agree with you.  But do we actually need a
> > wait_for_standby_and_slot_catchup() wrapper.  I think we can call
> > $node->wait_for_slot_catchup() directly and simplify the fix.  Check
> > the attached patch.
> >
>
> The patch looks good to me. I agree that the wait_for_slot_catchup is
> not needed and could be misleading. This change would make the exact
> synchronization point and its intention clearer. The only price we
> need to pay here is bringing back the polling. But it seems acceptable
> since the cost was there in the pre-wait-for-lsn era. And thanks for
> writing the great commit message!
>

Sorry for copy-pasting the wrong function name. It should be
wait_for_catchup().

>

Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-25 09:00  Alexander Korotkov <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Alexander Korotkov @ 2026-05-25 09:00 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Sat, May 23, 2026 at 9:40 PM Xuneng Zhou <[email protected]> wrote:
>> > I agree with you.  But do we actually need a
>> > wait_for_standby_and_slot_catchup() wrapper.  I think we can call
>> > $node->wait_for_slot_catchup() directly and simplify the fix.  Check
>> > the attached patch.
>> >
>>
>> The patch looks good to me. I agree that the wait_for_slot_catchup is
>> not needed and could be misleading. This change would make the exact
>> synchronization point and its intention clearer. The only price we
>> need to pay here is bringing back the polling. But it seems acceptable
>> since the cost was there in the pre-wait-for-lsn era. And thanks for
>> writing the great commit message!
>
>
> Sorry for copy-pasting the wrong function name. It should be wait_for_catchup().

Good, thank you.  I'll push it if no objections.

------
Regards,
Alexander Korotkov
Supabase





^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: Implement waiting for wal lsn replay: reloaded
@ 2026-05-26 01:48  Xuneng Zhou <[email protected]>
  parent: Alexander Korotkov <[email protected]>
  0 siblings, 0 replies; 20+ messages in thread

From: Xuneng Zhou @ 2026-05-26 01:48 UTC (permalink / raw)
  To: Alexander Korotkov <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Heikki Linnakangas <[email protected]>; Peter Eisentraut <[email protected]>; Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Álvaro Herrera <[email protected]>; Chao Li <[email protected]>; pgsql-hackers <[email protected]>; Michael Paquier <[email protected]>; jian he <[email protected]>; Tomas Vondra <[email protected]>; Yura Sokolov <[email protected]>

On Mon, May 25, 2026 at 5:00 PM Alexander Korotkov <[email protected]> wrote:
>
> On Sat, May 23, 2026 at 9:40 PM Xuneng Zhou <[email protected]> wrote:
> >> > I agree with you.  But do we actually need a
> >> > wait_for_standby_and_slot_catchup() wrapper.  I think we can call
> >> > $node->wait_for_slot_catchup() directly and simplify the fix.  Check
> >> > the attached patch.
> >> >
> >>
> >> The patch looks good to me. I agree that the wait_for_slot_catchup is
> >> not needed and could be misleading. This change would make the exact
> >> synchronization point and its intention clearer. The only price we
> >> need to pay here is bringing back the polling. But it seems acceptable
> >> since the cost was there in the pre-wait-for-lsn era. And thanks for
> >> writing the great commit message!
> >
> >
> > Sorry for copy-pasting the wrong function name. It should be wait_for_catchup().
>
> Good, thank you.  I'll push it if no objections.

While reading 019_replslot_limit.pl, Codex pointed out a few
inconsistencies in the comments. I verified them and they look real.
Would you mind doing a small cleanup as well?

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.


Attachments:

  [application/octet-stream] v2-0002-Clean-up-replslot-limit-test-comments.patch (2.6K, 2-v2-0002-Clean-up-replslot-limit-test-comments.patch)
  download | inline diff:
From 4f311e78e316423de9114aff14d859ba7c6aa76e Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Tue, 26 May 2026 09:40:04 +0800
Subject: [PATCH v2 2/2] Clean up replslot limit test comments

Update stale comments and test names in 019_replslot_limit.pl to match
the actual WAL advancement and wal_status checks. Also remove a redundant
standby stop in the inactive_since coverage.
---
 src/test/recovery/t/019_replslot_limit.pl | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 472aa07587f..0984f999e13 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -60,7 +60,7 @@ $result = $node_primary->safe_psql('postgres',
 );
 is($result, "reserved|t", 'check the catching-up state');
 
-# Advance WAL by five segments (= 5MB) on primary
+# Advance WAL by one segment (= 1MB) on primary
 $node_primary->advance_wal(1);
 $node_primary->safe_psql('postgres', "CHECKPOINT;");
 
@@ -110,7 +110,7 @@ $node_primary->safe_psql('postgres', "CHECKPOINT;");
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
 is($result, "reserved",
-	'check that safe_wal_size gets close to the current LSN');
+	'check that slot remains reserved after advancing WAL');
 
 # The standby can reconnect to primary
 $node_standby->start;
@@ -121,7 +121,7 @@ $node_standby->stop;
 # wal_keep_size overrides max_slot_wal_keep_size
 $result = $node_primary->safe_psql('postgres',
 	"ALTER SYSTEM SET wal_keep_size to '8MB'; SELECT pg_reload_conf();");
-# Advance WAL again then checkpoint, reducing remain by 6 MB.
+# Advance WAL again, reducing remain by 6 MB.
 $node_primary->advance_wal(6);
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status as remain FROM pg_replication_slots WHERE slot_name = 'rep1'"
@@ -141,7 +141,7 @@ $node_standby->stop;
 # Advance WAL again without checkpoint, reducing remain by 6 MB.
 $node_primary->advance_wal(6);
 
-# Slot gets into 'reserved' state
+# Slot gets into 'extended' state
 $result = $node_primary->safe_psql('postgres',
 	"SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep1'");
 is($result, "extended", 'check that the slot state changes to "extended"');
@@ -480,8 +480,6 @@ is( $primary4->safe_psql(
 	't',
 	'last inactive time for an inactive physical slot is updated correctly');
 
-$standby4->stop;
-
 # Testcase end: Check inactive_since property of the streaming standby's slot
 # =============================================================================
 
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

end of thread, other threads:[~2026-05-26 01:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-07 00:32 Re: Implement waiting for wal lsn replay: reloaded Andres Freund <[email protected]>
2026-01-07 04:08 ` Xuneng Zhou <[email protected]>
2026-01-08 14:19   ` Alexander Korotkov <[email protected]>
2026-01-08 16:29     ` Xuneng Zhou <[email protected]>
2026-01-08 20:42       ` Alexander Korotkov <[email protected]>
2026-01-09 13:44         ` Xuneng Zhou <[email protected]>
2026-01-10 04:47           ` Xuneng Zhou <[email protected]>
2026-01-12 06:53             ` Xuneng Zhou <[email protected]>
2026-01-20 01:28               ` Xuneng Zhou <[email protected]>
2026-01-27 01:14                 ` Xuneng Zhou <[email protected]>
2026-01-29 07:47                   ` Alexander Korotkov <[email protected]>
2026-05-19 20:00                     ` Alexander Lakhin <[email protected]>
2026-05-20 03:30                       ` Xuneng Zhou <[email protected]>
2026-05-20 05:18                         ` Xuneng Zhou <[email protected]>
2026-05-20 12:16                           ` Alexander Korotkov <[email protected]>
2026-05-23 17:09                             ` Xuneng Zhou <[email protected]>
2026-05-23 18:40                               ` Xuneng Zhou <[email protected]>
2026-05-25 09:00                                 ` Alexander Korotkov <[email protected]>
2026-05-26 01:48                                   ` Xuneng Zhou <[email protected]>
2026-01-07 08:06 ` Alexander Korotkov <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox