public inbox for [email protected]
help / color / mirror / Atom feedFrom: Xuneng Zhou <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Chao Li <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: jian he <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Yura Sokolov <[email protected]>
Subject: Re: Implement waiting for wal lsn replay: reloaded
Date: Mon, 12 Jan 2026 14:53:46 +0800
Message-ID: <CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
References: <CABPTF7Xs-64GQNjmbimZNhj2YSKbBny+evz6=cp3X2fkJS+vMQ@mail.gmail.com>
<[email protected]>
<CAPpHfduDVNo4VXgnQFZUg9=2yHQJfUUqjokbi3qVxuJiKNfcwg@mail.gmail.com>
<CABPTF7UkwQZGx5ub731Q+=+rU8yx4ruqMdDt__L_dm9_32LsMw@mail.gmail.com>
<CAPpHfds-KiZRuCruc0jHxLSxLqzKcHJGwOFFA0b_RgaJvtUOEQ@mail.gmail.com>
<CABPTF7Vy4A0S0W7=B-fVd9Bo_15XG_FRYfKUF2CB_i=5svwEZQ@mail.gmail.com>
<CA+hUKG+L0OQR8dQtsNBSaA3FNNyQeFabdeRVzurm0b7Xtp592g@mail.gmail.com>
<iuzfab2lr54iarbt5ijxtkdbrkv5qnioykluu7pg53hn7gv2te@hz2bkg4aw7bi>
<CABPTF7WN_3kDPBYPxaKKcp2kO5BLB5bK_YGz70VTzTCivHibZA@mail.gmail.com>
<CAPpHfdtu0zPDnpw1meVAeWaBXen245QmVP=NpOv2gcy4O9ObBA@mail.gmail.com>
<CABPTF7Ub=w7CRxi3sNv8oMGMh4hCqUTohuiTuP9Y1DpxRuFtRQ@mail.gmail.com>
<CAPpHfduJKv9-R2HcpyX9RNgteLL0M1MPS1No1WLnTsegsbG4MQ@mail.gmail.com>
<CABPTF7WWxgAAr5fT9TFciU+PzeRpC3Dp7SO60AV9XWx561TNKA@mail.gmail.com>
<CABPTF7X0n=R50z2fBpj3EbYYz04Ab0-DHJa+JfoAEny62QmUdg@mail.gmail.com>
Hi Alexander,
On Sat, Jan 10, 2026 at 12:47 PM Xuneng Zhou <[email protected]> wrote:
>
> On Fri, Jan 9, 2026 at 9:44 PM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Jan 9, 2026 at 4:42 AM Alexander Korotkov <[email protected]> wrote:
> > >
> > > On Thu, Jan 8, 2026 at 6:29 PM Xuneng Zhou <[email protected]> wrote:
> > > > On Thu, Jan 8, 2026 at 10:19 PM Alexander Korotkov <[email protected]> wrote:
> > > > > I see, you were right. This is not related to the MyProc->xmin.
> > > > > ResolveRecoveryConflictWithTablespace() calls
> > > > > GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid). That
> > > > > would kill WAIT FOR LSN query independently on its xmin.
> > > >
> > > > I think the concern is valid --- conflicts like
> > > > PROCSIG_RECOVERY_CONFLICT_SNAPSHOT could occur and terminate the
> > > > backend if the timing is unlucky. It's more difficult to reproduce
> > > > though. A check for the log containing "conflict with recovery" would
> > > > likely catch these conflicts as well.
> > >
> > > Yes, I found multiple reasons why xmin gets temporarily set during
> > > processing of WAIT FOR LSN query. I'll soon post a draft patch to fix
> > > that.
> > >
> > > > > I guess your
> > > > > patch is the only way to go. It's clumsy to wrap WAIT FOR LSN call
> > > > > with retry loop, but it would still consume less resources than
> > > > > polling.
> > > > >
> > > >
> > > > Assuming recovery conflicts are relatively rare in tap tests, except
> > > > for the explicitly designed tests like 031_recovery_conflict and the
> > > > narrow timing window that the standby has not caught up while the wait
> > > > for gets invoked, a simple fallback seems appropriate to me.
> > >
> > > Yes, I see. Seems acceptable given this seems the only feasible way to go.
> > >
> >
> > Here is the updated patch with recovery conflicts handled.
>
> V2 corrected the commit message to state " if the WAIT FOR LSN session
> is interrupted by a recovery conflict (e.g., DROP TABLESPACE
> triggering conflicts on all backends),". In this case, the statement
> is canceled when possible; in some states (idle in transaction or
> subtransaction) the session may be terminated.
>
The attached patch avoids a syscache lookup while constructing the
tuple descriptor for WAIT FOR LSN, so that a catalog snapshot is not
re-established after the wait finishes.
The standard output path (printtup) may still briefly establish a
catalog snapshot during result emission, but this seems acceptable:
the snapshot window is narrow to emit a single row. A fully
catalog-free output path would require either bypassing the
DestReceiver lifecycle (breaking layering) or adding a custom receiver
(added complexity for marginal benefit). The current approach is
simpler and might be sufficient unless output-phase conflicts are
observed a lot in practice. Does this make sense to you?
--
Best,
Xuneng
Attachments:
[application/octet-stream] v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch (2.4K, 2-v1-0001-Avoid-syscache-lookup-in-WAIT-FOR-LSN-tuple-descr.patch)
download | inline diff:
From 50beec4de4e078020b03667453458cd440c26267 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 12 Jan 2026 14:36:27 +0800
Subject: [PATCH v1] Avoid syscache lookup in WAIT FOR LSN tuple descriptor
Use TupleDescInitBuiltinEntry instead of TupleDescInitEntry when
building the result tuple descriptor for WAIT FOR LSN. This avoids
a syscache access that could re-establish a catalog snapshot after
we've explicitly released all snapshots before the wait.
---
src/backend/commands/wait.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
index 97f1e778488..191c1877125 100644
--- a/src/backend/commands/wait.c
+++ b/src/backend/commands/wait.c
@@ -15,6 +15,7 @@
#include <math.h>
+#include "access/tupdesc.h"
#include "access/xlog.h"
#include "access/xlogrecovery.h"
#include "access/xlogwait.h"
@@ -320,7 +321,17 @@ ExecWaitStmt(ParseState *pstate, WaitStmt *stmt, DestReceiver *dest)
break;
}
- /* need a tuple descriptor representing a single TEXT column */
+ /*
+ * Output the result.
+ *
+ * We use TupleDescInitBuiltinEntry in WaitStmtResultDesc to avoid
+ * syscache access when building the tuple descriptor. The standard output
+ * path may briefly establish a catalog snapshot during output, but this
+ * is acceptable since: 1. The snapshot window is very brief (just
+ * emitting one row) 2. The critical section (the wait itself) is already
+ * snapshot-free 3. Using the standard path respects receiver lifecycle
+ * and semantics
+ */
tupdesc = WaitStmtResultDesc(stmt);
/* prepare for projection of tuples */
@@ -337,9 +348,16 @@ WaitStmtResultDesc(WaitStmt *stmt)
{
TupleDesc tupdesc;
- /* Need a tuple descriptor representing a single TEXT column */
+ /*
+ * Need a tuple descriptor representing a single TEXT column.
+ *
+ * We use TupleDescInitBuiltinEntry instead of TupleDescInitEntry to avoid
+ * syscache access. This is important because WaitStmtResultDesc may be
+ * called after snapshots have been released, and we must not re-establish
+ * a catalog snapshot which could cause recovery conflicts on a standby.
+ */
tupdesc = CreateTemplateTupleDesc(1);
- TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
- TEXTOID, -1, 0);
+ TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "status",
+ TEXTOID, -1, 0);
return tupdesc;
}
--
2.51.0
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Implement waiting for wal lsn replay: reloaded
In-Reply-To: <CABPTF7U+SUnJX_woQYGe==R9Oz+-V6X0VO2stBLPGfJmH_LEhw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox