public inbox for [email protected]  
help / color / mirror / Atom feed
From: Xuneng Zhou <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Mon, 12 Jan 2026 14:53:46 +0800
Message-ID: <CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
	<[email protected]>
	<CAPpHfduDVNo4VXgnQFZUg9=2yHQJfUUqjokbi3qVxuJiKNfcwg@mail.gmail.com>
	<CABPTF7UkwQZGx5ub731Q+=+rU8yx4ruqMdDt__L_dm9_32LsMw@mail.gmail.com>
	<CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com>
	<CABPTF7Vy4A0S0W7=B-fVd9Bo_15XG_FRYfKUF2CB_i=5svwEZQ@mail.gmail.com>
	<CA+hUKG+L0OQR8dQtsNBSaA3FNNyQeFabdeRVzurm0b7Xtp592g@mail.gmail.com>
	<iuzfab2lr54iarbt5ijxtkdbrkv5qnioykluu7pg53hn7gv2te@hz2bkg4aw7bi>
	<CABPTF7WN_3kDPBYPxaKKcp2kO5BLB5bK_YGz70VTzTCivHibZA@mail.gmail.com>
	<CAPpHfdtu0zPDnpw1meVAeWaBXen245QmVP=NpOv2gcy4O9ObBA@mail.gmail.com>
	<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
	<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
	<CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
	<CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>

Hi Alexander,

On Sat, Jan 10, 2026 at 12:47 PM Xuneng Zhou <[email protected]> wrote:
>
> On Fri, Jan 9, 2026 at 9:44 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
> > >
> > > On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > > > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > > > I see, you were right.  This is not related to the MyProc->xmin.
> > > > > ResolveRecoveryConflictWithTablespace() calls
> > > > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid).  That
> > > > > would kill WAIT FOR LSN query independently on its xmin.
> > > >
> > > > I think the concern is valid --- conflicts like
> > > > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > > > backend if the timing is unlucky. It's more difficult to reproduce
> > > > though. A check for the log containing "conflict with recovery" would
> > > > likely catch these conflicts as well.
> > >
> > > Yes, I found multiple reasons why xmin gets temporarily set during
> > > processing of WAIT FOR LSN query.  I'll soon post a draft patch to fix
> > > that.
> > >
> > > > > I guess your
> > > > > patch is the only way to go.  It's clumsy to wrap WAIT FOR LSN call
> > > > > with retry loop, but it would still consume less resources than
> > > > > polling.
> > > > >
> > > >
> > > > Assuming recovery conflicts are relatively rare in tap tests, except
> > > > for the explicitly designed tests like 031_recovery_conflict and the
> > > > narrow timing window that the standby has not caught up while the wait
> > > > for gets invoked, a simple fallback seems appropriate to me.
> > >
> > > Yes, I see.  Seems acceptable given this seems the only feasible way to go.
> > >
> >
> > Here is the updated patch with recovery conflicts handled.
>
> V2 corrected the commit message to state " if the WAIT FOR LSN session
> is interrupted by a recovery conflict (e.g., DROP TABLESPACE
> triggering conflicts on all backends),".  In this case, the statement
> is canceled when possible; in some states (idle in transaction or
> subtransaction) the session may be terminated.
>

The attached patch avoids a syscache lookup while constructing the
tuple descriptor for WAIT FOR LSN, so that a catalog snapshot is not
re-established after the wait finishes.

The standard output path (printtup) may still briefly establish a
catalog snapshot during result emission, but this seems acceptable:
the snapshot window is narrow to emit a single row. A fully
catalog-free output path would require either bypassing the
DestReceiver lifecycle (breaking layering) or adding a custom receiver
(added complexity for marginal benefit). The current approach is
simpler and might be sufficient unless output-phase conflicts are
observed a lot in practice. Does this make sense to you?

--
Best,
Xuneng


Attachments:

  [application/octet-stream] v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch (2.4K, 2-v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch)
  download | inline diff:
From 50beec4de4e078020b03667453458cd440c26267 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 12 Jan 2026 14:36:27 +0800
Subject: [PATCH v1] Avoid syscache lookup in WAIT FOR LSN tuple descriptor

Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when
building the result tuple descriptor for WAIT FOR LSN. This avoids
a syscache access that could re-establish a catalog snapshot after
we've explicitly released all snapshots before the wait.
---
 src/backend/commands/wait.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
index 97f1e778488..191c1877125 100644
--- a/src/backend/commands/wait.c
+++ b/src/backend/commands/wait.c
@@ -15,6 +15,7 @@
 
 #include <math.h>
 
+#include "access/tupdesc.h"
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogwait.h"
@@ -320,7 +321,17 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
 			break;
 	}
 
-	/* need a tuple descriptor representing a single TEXT column */
+	/*
+	 * Output the result.
+	 *
+	 * We use TupleDescInitBuiltinEntry in WaitStmtResultDesc to avoid
+	 * syscache access when building the tuple descriptor. The standard output
+	 * path may briefly establish a catalog snapshot during output, but this
+	 * is acceptable since: 1. The snapshot window is very brief (just
+	 * emitting one row) 2. The critical section (the wait itself) is already
+	 * snapshot-free 3. Using the standard path respects receiver lifecycle
+	 * and semantics
+	 */
 	tupdesc = WaitStmtResultDesc(stmt);
 
 	/* prepare for projection of tuples */
@@ -337,9 +348,16 @@ WaitStmtResultDesc(WaitStmt *stmt)
 {
 	TupleDesc	tupdesc;
 
-	/* Need a tuple descriptor representing a single TEXT  column */
+	/*
+	 * Need a tuple descriptor representing a single TEXT column.
+	 *
+	 * We use TupleDescInitBuiltinEntry instead of TupleDescInitEntry to avoid
+	 * syscache access. This is important because WaitStmtResultDesc may be
+	 * called after snapshots have been released, and we must not re-establish
+	 * a catalog snapshot which could cause recovery conflicts on a standby.
+	 */
 	tupdesc = CreateTemplateTupleDesc(1);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-					   TEXTOID, -1, 0);
+	TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "status",
+							  TEXTOID, -1, 0);
 	return tupdesc;
 }
-- 
2.51.0



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Implement waiting for wal lsn replay: reloaded
  In-Reply-To: <CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox