public inbox for [email protected]
help / color / mirror / Atom feedFrom: Zhijie Hou (Fujitsu) <[email protected]>
To: Amit Kapila <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: RE: Fix slotsync worker busy loop causing repeated log messages
Date: Tue, 3 Mar 2026 12:24:33 +0000
Message-ID: <OS7PR01MB169097E45A7A84AF631534306947FA@OS7PR01MB16909.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <CAA4eK1LGDd2y7Bnj9rHEJLzJx4vThF23+jH9j8bZjuMard9RRA@mail.gmail.com>
References: <CAHGQGwF6zG9Z8ws1yb3hY1VqV-WT7hR0qyXCn2HdbjvZQKufDw@mail.gmail.com>
<CAA4eK1KLk+TWyNPJ=z6SzQQXySc-N9Gs3eR-QKfV+MX7vfJWiw@mail.gmail.com>
<OS7PR01MB16909C13530D84781E7C2E2EF947FA@OS7PR01MB16909.jpnprd01.prod.outlook.com>
<CAA4eK1LGDd2y7Bnj9rHEJLzJx4vThF23+jH9j8bZjuMard9RRA@mail.gmail.com>
On Tuesday, March 3, 2026 6:25 PM Amit Kapila <[email protected]> wrote:
> On Tue, Mar 3, 2026 at 1:12 PM Zhijie Hou (Fujitsu) <[email protected]>
> wrote:
> >
> > On Saturday, February 28, 2026 1:03 PM Amit Kapila
> <[email protected]> wrote:
> > > On Fri, Feb 27, 2026 at 8:34 PM Fujii Masao <[email protected]>
> wrote:
> > > >
> > > > Normally, the slotsync worker updates the standby slot using the
> > > > primary's slot state. However, when confirmed_flush_lsn matches
> > > > but restart_lsn does not, the worker does not actually update the
> > > > standby slot. Despite that, the current code of
> > > > update_local_synced_slot() appears to treat this situation as if
> > > > an update occurred. As a result, the worker sleeps only for the
> > > > minimum interval (200 ms) before retrying. In the next cycle, it
> > > > again assumes an update happened, and continues looping with the
> > > > short sleep interval, causing the repeated logical decoding log
> > > > messages. Based on a quick analysis, this seems to be
> > > the root cause.
> > > >
> > > > I think update_local_synced_slot() should return false (i.e., no
> > > > update
> > > > happened) when confirmed_flush_lsn is equal but restart_lsn
> > > > differs between primary and standby.
> > > >
> > >
> > > We expect that in such a case update_local_synced_slot() should
> > > advance local_slot's 'restart_lsn' via
> > > LogicalSlotAdvanceAndCheckSnapState(),
> > > otherwise, it won't go in the cheap code path next time. Normally,
> > > restart_lsn advancement should happen when we process
> > > XLOG_RUNNING_XACTS and call SnapBuildProcessRunningXacts(). In this
> > > particular case as both restart_lsn and confirmed_flush_lsn are the
> > > same (0/03000140), the machinery may not be processing
> > > XLOG_RUNNING_XACTS record. I have not debugged the exact case yet
> > > but you can try by emitting some more records on publisher, it
> > > should let the standby advance the slot. It is possible that we can
> > > do something like you are proposing to silence the LOG messages but we
> should know what is going on here.
> >
> > I reproduced and debugged this issue where a replication slot's
> > restart_lsn fails to advance. In my environment, I found it only
> > occurs when a synced slot first builds a consistent snapshot. The
> > problematic code path is in
> > SnapBuildProcessRunningXacts():
> >
> > if (builder->state < SNAPBUILD_CONSISTENT)
> > {
> > /* returns false if there's no point in performing cleanup just yet */
> > if (!SnapBuildFindSnapshot(builder, lsn, running))
> > return;
> > }
> >
> > When a synced slot reaches consistency for the first time with no
> > running transactions, SnapBuildFindSnapshot() returns false, causing
> > the function to return without updating the candidate restart_lsn.
> >
> > So, an alternative approach is to improve this logic by updating the
> > candidate restart_lsn in this case instead of returning early.
> >
>
> But why not return 'true' from SnapBuildFindSnapshot() in that case?
> The comment atop SnapBuildFindSnapshot() says: "Returns true if there is a
> point in performing internal maintenance/cleanup using the xl_running_xacts
> record.". Doesn't updating restart_lsn fall under that category?
After re-thinking the return value semantics, I realized that
SnapBuildFindSnapshot() may no longer need a return value. In the revised
logic, the function would always return true, making the return value
redundant.
To elaborate: the original SnapBuildFindSnapshot() returned:
- true: when snapshot is built incrementally (after waiting for running
transactions to complete)
- false: when snapshot is built immediately (restoring from disk or no
running transactions)
My initial thought was to use this return value to decide whether to advance
restart_lsn to the latest LSN (advance when restoring a snapshot or with no running
transactions). However, this decision can be made without relying on the
return value (please see the v2 patch).
Therefore, we can change the function return type to void. The only other
consumer of this return value is SnapBuildPurgeOlderTxn(), which can instead
skip cleanup when no transaction data has accumulated (i.e., when the number
of committed transactions is zero), eliminating the need for the return value
altogether.
> However, I have a question that even if we haven't incremented it in the first
> cycle, why is it not incrementing restart_lsn in consecutive sync cycles.
Because no walsender was active during the reproduction step, so the slot remained
inactive on the publisher and its restart_lsn didn't advanced. As a result, the
slotsync process stalled while continuously retrying the first cycle of
snapshot building.
Best Regards,
Hou zj
Attachments:
[application/octet-stream] v2-0001-Advance-restart_lsn-when-reaching-consistency-wit.patch (6.8K, 2-v2-0001-Advance-restart_lsn-when-reaching-consistency-wit.patch)
download | inline diff:
From 19612c969d68089d6ddca6f78910ccefc556f35a Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Tue, 3 Mar 2026 11:44:38 +0800
Subject: [PATCH v3] Advance restart_lsn when reaching consistency without
waiting
Currently, the replication slot's restart_lsn is not advanced when first time
building a consistent snapshot, even when it's safe to do so. This can lead
to unnecessary retention of WAL segments, though the impact is rare.
This commit advances restart_lsn at the consistency point if either:
a serialized snapshot from a previous decoding session is available, or
there were no running transactions when reaching consistency
In both cases, it's safe and efficient to restart decoding from this LSN,
reducing WAL retention without affecting decoding capabilities.
---
src/backend/replication/logical/snapbuild.c | 86 ++++++++++-----------
1 file changed, 41 insertions(+), 45 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 7f79621b57e..252894dc090 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -167,7 +167,7 @@ static inline bool SnapBuildXidHasCatalogChanges(SnapBuild *builder, Transaction
uint32 xinfo);
/* xlog reading helper functions for SnapBuildProcessRunningXacts */
-static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
+static void SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff);
/* serialization functions */
@@ -869,31 +869,34 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
if (!TransactionIdIsNormal(builder->xmin))
return;
- /* TODO: Neater algorithm than just copying and iterating? */
- workspace =
- MemoryContextAlloc(builder->context,
- builder->committed.xcnt * sizeof(TransactionId));
-
- /* copy xids that still are interesting to workspace */
- for (off = 0; off < builder->committed.xcnt; off++)
+ if (builder->committed.xcnt > 0)
{
- if (NormalTransactionIdPrecedes(builder->committed.xip[off],
- builder->xmin))
- ; /* remove */
- else
- workspace[surviving_xids++] = builder->committed.xip[off];
- }
+ /* TODO: Neater algorithm than just copying and iterating? */
+ workspace =
+ MemoryContextAlloc(builder->context,
+ builder->committed.xcnt * sizeof(TransactionId));
+
+ /* copy xids that still are interesting to workspace */
+ for (off = 0; off < builder->committed.xcnt; off++)
+ {
+ if (NormalTransactionIdPrecedes(builder->committed.xip[off],
+ builder->xmin))
+ ; /* remove */
+ else
+ workspace[surviving_xids++] = builder->committed.xip[off];
+ }
- /* copy workspace back to persistent state */
- memcpy(builder->committed.xip, workspace,
- surviving_xids * sizeof(TransactionId));
+ /* copy workspace back to persistent state */
+ memcpy(builder->committed.xip, workspace,
+ surviving_xids * sizeof(TransactionId));
- elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
- (uint32) builder->committed.xcnt, (uint32) surviving_xids,
- builder->xmin, builder->xmax);
- builder->committed.xcnt = surviving_xids;
+ elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+ (uint32) builder->committed.xcnt, (uint32) surviving_xids,
+ builder->xmin, builder->xmax);
+ builder->committed.xcnt = surviving_xids;
- pfree(workspace);
+ pfree(workspace);
+ }
/*
* Purge xids in ->catchange as well. The purged array must also be sorted
@@ -1143,11 +1146,7 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
* our snapshot so others or we, after a restart, can use it.
*/
if (builder->state < SNAPBUILD_CONSISTENT)
- {
- /* returns false if there's no point in performing cleanup just yet */
- if (!SnapBuildFindSnapshot(builder, lsn, running))
- return;
- }
+ SnapBuildFindSnapshot(builder, lsn, running);
else
SnapBuildSerialize(builder, lsn);
@@ -1197,8 +1196,10 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
*/
/*
- * Can't know about a serialized snapshot's location if we're not
- * consistent.
+ * Cannot advance restart_lsn to a point where a consistent snapshot cannot
+ * be built immediately in the next decoding round (either by restoring a
+ * serialized snapshot or by confirming there are no running transactions).
+ * Doing so could cause data prior to reaching consistency to be lost.
*/
if (builder->state < SNAPBUILD_CONSISTENT)
return;
@@ -1221,6 +1222,15 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
XLogRecPtrIsValid(builder->last_serialized_snapshot))
LogicalIncreaseRestartDecodingForSlot(lsn,
builder->last_serialized_snapshot);
+
+ /*
+ * Reaching here indicates we built the snapshot either by restoring a
+ * serialized snapshot from a previous decoding session or because there
+ * were no running transactions. In either case, it's safe and efficient to
+ * restart from this LSN next time.
+ */
+ else
+ LogicalIncreaseRestartDecodingForSlot(lsn, lsn);
}
@@ -1229,11 +1239,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
*
* Helper function for SnapBuildProcessRunningXacts() while we're not yet
* consistent.
- *
- * Returns true if there is a point in performing internal maintenance/cleanup
- * using the xl_running_xacts record.
*/
-static bool
+static void
SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
{
/* ---
@@ -1278,7 +1285,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
SnapBuildWaitSnapshot(running, builder->initial_xmin_horizon);
- return true;
+ return;
}
/*
@@ -1312,8 +1319,6 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
errmsg("logical decoding found consistent point at %X/%08X",
LSN_FORMAT_ARGS(lsn)),
errdetail("There are no running transactions."));
-
- return false;
}
/*
@@ -1324,8 +1329,6 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
!builder->in_slot_creation &&
SnapBuildRestore(builder, lsn))
{
- /* there won't be any state to cleanup */
- return false;
}
/*
@@ -1410,13 +1413,6 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
LSN_FORMAT_ARGS(lsn)),
errdetail("There are no old transactions anymore."));
}
-
- /*
- * We already started to track running xacts and need to wait for all
- * in-progress ones to finish. We fall through to the normal processing of
- * records so incremental cleanup can be performed.
- */
- return true;
}
/* ---
--
2.51.1.windows.1
view thread (13+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: RE: Fix slotsync worker busy loop causing repeated log messages
In-Reply-To: <OS7PR01MB169097E45A7A84AF631534306947FA@OS7PR01MB16909.jpnprd01.prod.outlook.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox