Re: Implement waiting for wal lsn replay: reloaded

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Xuneng Zhou <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Fri, 9 Jan 2026 21:44:17 +0800
Message-ID: <CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com> (raw)
In-Reply-To: <CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
	<[email protected]>
	<CAPpHfduDVNo4VXgnQFZUg9=2yHQJfUUqjokbi3qVxuJiKNfcwg@mail.gmail.com>
	<CABPTF7UkwQZGx5ub731Q+=+rU8yx4ruqMdDt__L_dm9_32LsMw@mail.gmail.com>
	<CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com>
	<CABPTF7Vy4A0S0W7=B-fVd9Bo_15XG_FRYfKUF2CB_i=5svwEZQ@mail.gmail.com>
	<CA+hUKG+L0OQR8dQtsNBSaA3FNNyQeFabdeRVzurm0b7Xtp592g@mail.gmail.com>
	<iuzfab2lr54iarbt5ijxtkdbrkv5qnioykluu7pg53hn7gv2te@hz2bkg4aw7bi>
	<CABPTF7WN_3kDPBYPxaKKcp2kO5BLB5bK_YGz70VTzTCivHibZA@mail.gmail.com>
	<CAPpHfdtu0zPDnpw1meVAeWaBXen245QmVP=NpOv2gcy4O9ObBA@mail.gmail.com>
	<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
	<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>

Hi,

On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
>
> On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > I see, you were right.  This is not related to the MyProc->xmin.
> > > ResolveRecoveryConflictWithTablespace() calls
> > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > > would kill WAIT FOR LSN query independently on its xmin.
> >
> > I think the concern is valid --- conflicts like
> > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > backend if the timing is unlucky. It's more difficult to reproduce
> > though. A check for the log containing "conflict with recovery" would
> > likely catch these conflicts as well.
>
> Yes, I found multiple reasons why xmin gets temporarily set during
> processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
> that.
>
> > > I guess your
> > > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > > with retry loop, but it would still consume less resources than
> > > polling.
> > >
> >
> > Assuming recovery conflicts are relatively rare in tap tests, except
> > for the explicitly designed tests like 031_recovery_conflict and the
> > narrow timing window that the standby has not caught up while the wait
> > for gets invoked, a simple fallback seems appropriate to me.
>
> Yes, I see.  Seems acceptable given this seems the only feasible way to go.
>

Here is the updated patch with recovery conflicts handled.

-- 
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch (6.3K, 2-v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch)
  download | inline diff:
From 1b1aa652aff6681e5f43eba4f4690b174052d478 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 9 Jan 2026 21:32:12 +0800
Subject: [PATCH v1] Use WAIT FOR LSN in
 PostgreSQL::Test::Cluster::wait_for_catchup()

When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes.  This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.

The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
- The standby is in recovery

For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), or when the standby has
been promoted, the function falls back to the original polling-based
approach using pg_stat_replication on the upstream.

Additionally, if the WAIT FOR LSN session is killed by a recovery
conflict (e.g., DROP TABLESPACE killing all backends indiscriminately),
the function catches this error and falls back to polling.  This makes
the test infrastructure robust against the timing-dependent conflicts
that can occur in tests like 031_recovery_conflict.

Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
---
 src/test/perl/PostgreSQL/Test/Cluster.pm | 91 +++++++++++++++++++++++-
 1 file changed, 90 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 955dfc0e7f8..87c3d2750cb 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3320,6 +3320,13 @@ If you pass an explicit value of target_lsn, it should almost always be
 the primary's write LSN; so this parameter is seldom needed except when
 querying some intermediate replication node rather than the primary.
 
+When the standby is passed as a PostgreSQL::Test::Cluster instance and is
+in recovery, this function uses the WAIT FOR LSN command on the standby
+for modes replay, write, and flush.  This is more efficient than polling
+pg_stat_replication on the upstream, as WAIT FOR LSN uses a latch-based
+wakeup mechanism.  For 'sent' mode, or when the standby is passed as a
+string (e.g., a subscription name), it falls back to polling.
+
 If there is no active replication connection from this peer, waits until
 poll_query_until timeout.
 
@@ -3339,10 +3346,13 @@ sub wait_for_catchup
 	  . join(', ', keys(%valid_modes))
 	  unless exists($valid_modes{$mode});
 
-	# Allow passing of a PostgreSQL::Test::Cluster instance as shorthand
+	# Keep a reference to the standby node if passed as an object, so we can
+	# use WAIT FOR LSN on it later.
+	my $standby_node;
 	if (blessed($standby_name)
 		&& $standby_name->isa("PostgreSQL::Test::Cluster"))
 	{
+		$standby_node = $standby_name;
 		$standby_name = $standby_name->name;
 	}
 	if (!defined($target_lsn))
@@ -3367,6 +3377,85 @@ sub wait_for_catchup
 	  . $self->name . "\n";
 	# Before release 12 walreceiver just set the application name to
 	# "walreceiver"
+
+	# Use WAIT FOR LSN on the standby when:
+	# - The standby was passed as a Cluster object (so we can connect to it)
+	# - The mode is replay, write, or flush (not 'sent')
+	# - The standby is in recovery
+	# This is more efficient than polling pg_stat_replication on the upstream,
+	# as WAIT FOR LSN uses a latch-based wakeup mechanism.
+	if (defined($standby_node) && ($mode ne 'sent'))
+	{
+		my $standby_in_recovery =
+		  $standby_node->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+		chomp($standby_in_recovery);
+
+		if ($standby_in_recovery eq 't')
+		{
+			# Map mode names to WAIT FOR LSN mode names
+			my %mode_map = (
+				'replay' => 'standby_replay',
+				'write'  => 'standby_write',
+				'flush'  => 'standby_flush',
+			);
+			my $wait_mode = $mode_map{$mode};
+			my $timeout = $PostgreSQL::Test::Utils::timeout_default;
+			my $wait_query =
+			  qq[WAIT FOR LSN '${target_lsn}' WITH (MODE '${wait_mode}', timeout '${timeout}s', no_throw);];
+
+			# Try WAIT FOR LSN. If it succeeds, we're done. If it returns a
+			# non-success status (timeout, not_in_recovery), fail immediately.
+			# If the session is interrupted (e.g., killed by recovery conflict),
+			# fall back to polling on the upstream which is immune to standby-
+			# side conflicts.
+			my $output;
+			local $@;
+			my $wait_succeeded = eval {
+				$output = $standby_node->safe_psql('postgres', $wait_query);
+				chomp($output);
+				1;
+			};
+
+			if ($wait_succeeded && $output eq 'success')
+			{
+				print "done\n";
+				return;
+			}
+
+			# If WAIT FOR LSN executed but returned non-success (e.g., timeout,
+			# not_in_recovery), fail immediately with diagnostic info. Falling
+			# back to polling would just waste time.
+			if ($wait_succeeded)
+			{
+				my $details = $self->safe_psql('postgres',
+					"SELECT * FROM pg_catalog.pg_stat_replication");
+				diag qq(WAIT FOR LSN returned '$output'
+pg_stat_replication on upstream:
+${details});
+				croak "WAIT FOR LSN '$wait_mode' to '$target_lsn' returned '$output'";
+			}
+
+			# WAIT FOR LSN was interrupted. Only fall back to polling if this
+			# looks like a recovery conflict - the canonical PostgreSQL error
+			# message contains "conflict with recovery". Other errors should
+			# fail immediately rather than being masked by a silent fallback.
+			if ($@ =~ /conflict with recovery/i)
+			{
+				diag qq(WAIT FOR LSN interrupted, falling back to polling:
+$@);
+			}
+			else
+			{
+				croak "WAIT FOR LSN failed: $@";
+			}
+		}
+	}
+
+	# Fall back to polling pg_stat_replication on the upstream for:
+	# - 'sent' mode (no corresponding WAIT FOR LSN mode)
+	# - When standby_name is a string (e.g., subscription name)
+	# - When the standby is no longer in recovery (was promoted)
+	# - When WAIT FOR LSN was interrupted (e.g., killed by a recovery conflict)
 	my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
          FROM pg_catalog.pg_stat_replication
          WHERE application_name IN ('$standby_name', 'walreceiver')];
-- 
2.51.0

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Implement waiting for wal lsn replay: reloaded
  In-Reply-To: <CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox