public inbox for [email protected]
help / color / mirror / Atom feedFrom: Ayush Tiwari <[email protected]>
To: [email protected]
Subject: Proposal: Prevent Primary/Standby SLRU divergence during MultiXact truncation
Date: Mon, 16 Mar 2026 21:39:30 +0530
Message-ID: <CAJTYsWXd4s0eYnN+80ND1t1q6gabk2dbN-w-K6AJj44nz+Xd9Q@mail.gmail.com> (raw)
Hi Hackers,
Looking at the MultiXact truncation behavior and reading through the recent
thread regarding the 17.8 standby crashing during WAL replay (commit
8ba61bc), we noticed an architectural edge case that seems to cause a
silent primary/standby SLRU divergence. I'd like to ask if this is a known
accepted risk or if a patch to reorder this logic is worth exploring.
The Issue:
In TruncateMultiXact(), we write the truncation WAL record
(WriteMTruncateXlogRec) before we actually perform the truncation via
PerformOffsetsTruncation() -> SimpleLruTruncate().
The problem arises from the "apparent wraparound" safety check inside
SimpleLruTruncate(). If SlruScanDirectory() detects an apparent wraparound,
SimpleLruTruncate() safely bails out and skips unlinking the SLRU segments
on the primary, logging: could not truncate directory "%s": apparent
wraparound.
However, the WAL record for the truncation has already been flushed.
Standbys replay this TRUNCATE_ID WAL record and blindly delete their SLRU
segments. At this point, the primary and standby have diverged.
The Impact:
If the standby is subsequently promoted to primary, any attempt to access
rows holding those older MultiXact IDs (which the original primary decided
to keep) will throw a FATAL: could not access status of transaction error,
effectively resulting in data loss / inaccessible rows for the user.
While the recent commits address the immediate standby crash involving
latest_page_number during multixact_redo(), they don't seem to prevent the
primary from emitting a "false" WAL truncation record when it abandons its
own truncation.
Proposed Approach:
It seems safer to only emit the WAL record if we are guaranteed to follow
through with the truncation. We could modify SimpleLruTruncate() to perform
its safety checks first and return a boolean indicating whether the
truncation is safe to proceed. TruncateMultiXact() would then only call
WriteMTruncateXlogRec() and proceed with physical deletion if the check
passes.
I have attached a rough draft patch illustrating this sequence change.
Is this a scenario the community has already considered, or is this
reordering something that should be explored further to harden standby
reliability?
PS. Also, noticed this to be the case in clog.c file
Thanks for your time.
Regards,
Ayush
Attachments:
[application/octet-stream] v1-prevent-multixact-slru-divergence 2.patch (1.9K, 3-v1-prevent-multixact-slru-divergence%202.patch)
download | inline diff:
From 9b8830fe35ac9a2c2efabc1e91244e8abcd12345 Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <[email protected]>
Date: Mon, 16 Mar 2026 12:00:00 +0000
Subject: [Draft Patch v1] Prevent SLRU divergence by checking wraparound before WAL
flush
In TruncateMultiXact(), if SimpleLruTruncate() encounters an apparent
wraparound, it aborts the truncation safely. However, the TRUNCATE_ID
WAL record has already been emitted. Standbys blindly replay this record
and delete their SLRU segments.
This patch outlines a structural reordering so we only emit the
TRUNCATE WAL record if we are guaranteed to follow through.
---
src/backend/access/transam/multixact.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a1b2c3d4e..f5g6h7i8j 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2980,10 +2980,20 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
MyProc->delayChkptFlags |= DELAY_CHKPT_START;
- /* WAL log truncation */
- WriteMTruncateXlogRec(newOldestMultiDB, newOldestMulti, newOldestOffset);
+ /*
+ * TODO/DRAFT:
+ * We should check if SimpleLruTruncate() will abort due to
+ * apparent wraparound *before* flushing the WAL.
+ *
+ * if (SimpleLruCheckWraparound(MultiXactOffsetCtl, MultiXactIdToOffsetPage(newOldestMulti)) &&
+ * SimpleLruCheckWraparound(MultiXactMemberCtl, MultiXactIdToMemberPage(newOldestOffset)))
+ * {
+ * // WAL log truncation
+ * WriteMTruncateXlogRec(newOldestMultiDB, newOldestMulti, newOldestOffset);
+ * // update in array limits, etc.
+ * // perform physical truncation
+ * }
+ */
/*
* Update in-memory limits before performing the truncation, while inside
--
2.34.1
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: Proposal: Prevent Primary/Standby SLRU divergence during MultiXact truncation
In-Reply-To: <CAJTYsWXd4s0eYnN+80ND1t1q6gabk2dbN-w-K6AJj44nz+Xd9Q@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox