public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Fri, 9 Jan 2026 21:44:17 +0800
Message-ID: <CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com> (raw)
In-Reply-To: <CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
<[email protected]>
<CAPpHfduDVNo4VXgnQFZUg9=2yHQJfUUqjokbi3qVxuJiKNfcwg@mail.gmail.com>
<CABPTF7UkwQZGx5ub731Q+=+rU8yx4ruqMdDt__L_dm9_32LsMw@mail.gmail.com>
<CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com>
<CABPTF7Vy4A0S0W7=B-fVd9Bo_15XG_FRYfKUF2CB_i=5svwEZQ@mail.gmail.com>
<CA+hUKG+L0OQR8dQtsNBSaA3FNNyQeFabdeRVzurm0b7Xtp592g@mail.gmail.com>
<iuzfab2lr54iarbt5ijxtkdbrkv5qnioykluu7pg53hn7gv2te@hz2bkg4aw7bi>
<CABPTF7WN_3kDPBYPxaKKcp2kO5BLB5bK_YGz70VTzTCivHibZA@mail.gmail.com>
<CAPpHfdtu0zPDnpw1meVAeWaBXen245QmVP=NpOv2gcy4O9ObBA@mail.gmail.com>
<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
Hi,
On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
>
> On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > I see, you were right. This is not related to the MyProc->xmin.
> > > ResolveRecoveryConflictWithTablespace() calls
> > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid). That
> > > would kill WAIT FOR LSN query independently on its xmin.
> >
> > I think the concern is valid --- conflicts like
> > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > backend if the timing is unlucky. It's more difficult to reproduce
> > though. A check for the log containing "conflict with recovery" would
> > likely catch these conflicts as well.
>
> Yes, I found multiple reasons why xmin gets temporarily set during
> processing of WAIT FOR LSN query. I'll soon post a draft patch to fix
> that.
>
> > > I guess your
> > > patch is the only way to go. It's clumsy to wrap WAIT FOR LSN call
> > > with retry loop, but it would still consume less resources than
> > > polling.
> > >
> >
> > Assuming recovery conflicts are relatively rare in tap tests, except
> > for the explicitly designed tests like 031_recovery_conflict and the
> > narrow timing window that the standby has not caught up while the wait
> > for gets invoked, a simple fallback seems appropriate to me.
>
> Yes, I see. Seems acceptable given this seems the only feasible way to go.
>
Here is the updated patch with recovery conflicts handled.
--
Best,
Xuneng
Attachments:
[application/octet-stream] v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch (6.3K, 2-v1-0001-Use-WAIT-FOR-LSN-in-PostgreSQL-Test-Cluster-wait_.patch)
download | inline diff:
From 1b1aa652aff6681e5f43eba4f4690b174052d478 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Fri, 9 Jan 2026 21:32:12 +0800
Subject: [PATCH v1] Use WAIT FOR LSN in
PostgreSQL::Test::Cluster::wait_for_catchup()
When the standby is passed as a PostgreSQL::Test::Cluster instance,
use the WAIT FOR LSN command on the standby server to implement
wait_for_catchup() for replay, write, and flush modes. This is more
efficient than polling pg_stat_replication on the upstream, as the
WAIT FOR LSN command uses a latch-based wakeup mechanism.
The optimization applies when:
- The standby is passed as a Cluster object (not just a name string)
- The mode is 'replay', 'write', or 'flush' (not 'sent')
- The standby is in recovery
For 'sent' mode, when the standby is passed as a string (e.g., a
subscription name for logical replication), or when the standby has
been promoted, the function falls back to the original polling-based
approach using pg_stat_replication on the upstream.
Additionally, if the WAIT FOR LSN session is killed by a recovery
conflict (e.g., DROP TABLESPACE killing all backends indiscriminately),
the function catches this error and falls back to polling. This makes
the test infrastructure robust against the timing-dependent conflicts
that can occur in tests like 031_recovery_conflict.
Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com
Author: Xuneng Zhou <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Alvaro Herrera <[email protected]>
---
src/test/perl/PostgreSQL/Test/Cluster.pm | 91 +++++++++++++++++++++++-
1 file changed, 90 insertions(+), 1 deletion(-)
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 955dfc0e7f8..87c3d2750cb 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3320,6 +3320,13 @@ If you pass an explicit value of target_lsn, it should almost always be
the primary's write LSN; so this parameter is seldom needed except when
querying some intermediate replication node rather than the primary.
+When the standby is passed as a PostgreSQL::Test::Cluster instance and is
+in recovery, this function uses the WAIT FOR LSN command on the standby
+for modes replay, write, and flush. This is more efficient than polling
+pg_stat_replication on the upstream, as WAIT FOR LSN uses a latch-based
+wakeup mechanism. For 'sent' mode, or when the standby is passed as a
+string (e.g., a subscription name), it falls back to polling.
+
If there is no active replication connection from this peer, waits until
poll_query_until timeout.
@@ -3339,10 +3346,13 @@ sub wait_for_catchup
. join(', ', keys(%valid_modes))
unless exists($valid_modes{$mode});
- # Allow passing of a PostgreSQL::Test::Cluster instance as shorthand
+ # Keep a reference to the standby node if passed as an object, so we can
+ # use WAIT FOR LSN on it later.
+ my $standby_node;
if (blessed($standby_name)
&& $standby_name->isa("PostgreSQL::Test::Cluster"))
{
+ $standby_node = $standby_name;
$standby_name = $standby_name->name;
}
if (!defined($target_lsn))
@@ -3367,6 +3377,85 @@ sub wait_for_catchup
. $self->name . "\n";
# Before release 12 walreceiver just set the application name to
# "walreceiver"
+
+ # Use WAIT FOR LSN on the standby when:
+ # - The standby was passed as a Cluster object (so we can connect to it)
+ # - The mode is replay, write, or flush (not 'sent')
+ # - The standby is in recovery
+ # This is more efficient than polling pg_stat_replication on the upstream,
+ # as WAIT FOR LSN uses a latch-based wakeup mechanism.
+ if (defined($standby_node) && ($mode ne 'sent'))
+ {
+ my $standby_in_recovery =
+ $standby_node->safe_psql('postgres', "SELECT pg_is_in_recovery()");
+ chomp($standby_in_recovery);
+
+ if ($standby_in_recovery eq 't')
+ {
+ # Map mode names to WAIT FOR LSN mode names
+ my %mode_map = (
+ 'replay' => 'standby_replay',
+ 'write' => 'standby_write',
+ 'flush' => 'standby_flush',
+ );
+ my $wait_mode = $mode_map{$mode};
+ my $timeout = $PostgreSQL::Test::Utils::timeout_default;
+ my $wait_query =
+ qq[WAIT FOR LSN '${target_lsn}' WITH (MODE '${wait_mode}', timeout '${timeout}s', no_throw);];
+
+ # Try WAIT FOR LSN. If it succeeds, we're done. If it returns a
+ # non-success status (timeout, not_in_recovery), fail immediately.
+ # If the session is interrupted (e.g., killed by recovery conflict),
+ # fall back to polling on the upstream which is immune to standby-
+ # side conflicts.
+ my $output;
+ local $@;
+ my $wait_succeeded = eval {
+ $output = $standby_node->safe_psql('postgres', $wait_query);
+ chomp($output);
+ 1;
+ };
+
+ if ($wait_succeeded && $output eq 'success')
+ {
+ print "done\n";
+ return;
+ }
+
+ # If WAIT FOR LSN executed but returned non-success (e.g., timeout,
+ # not_in_recovery), fail immediately with diagnostic info. Falling
+ # back to polling would just waste time.
+ if ($wait_succeeded)
+ {
+ my $details = $self->safe_psql('postgres',
+ "SELECT * FROM pg_catalog.pg_stat_replication");
+ diag qq(WAIT FOR LSN returned '$output'
+pg_stat_replication on upstream:
+${details});
+ croak "WAIT FOR LSN '$wait_mode' to '$target_lsn' returned '$output'";
+ }
+
+ # WAIT FOR LSN was interrupted. Only fall back to polling if this
+ # looks like a recovery conflict - the canonical PostgreSQL error
+ # message contains "conflict with recovery". Other errors should
+ # fail immediately rather than being masked by a silent fallback.
+ if ($@ =~ /conflict with recovery/i)
+ {
+ diag qq(WAIT FOR LSN interrupted, falling back to polling:
+$@);
+ }
+ else
+ {
+ croak "WAIT FOR LSN failed: $@";
+ }
+ }
+ }
+
+ # Fall back to polling pg_stat_replication on the upstream for:
+ # - 'sent' mode (no corresponding WAIT FOR LSN mode)
+ # - When standby_name is a string (e.g., subscription name)
+ # - When the standby is no longer in recovery (was promoted)
+ # - When WAIT FOR LSN was interrupted (e.g., killed by a recovery conflict)
my $query = qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming'
FROM pg_catalog.pg_stat_replication
WHERE application_name IN ('$standby_name', 'walreceiver')];
--
2.51.0
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Implement waiting for wal lsn replay: reloaded
In-Reply-To: <CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox