public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andrey Borodin <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Sebastian Webber <[email protected]>
Cc: [email protected]
Cc: Andrey Borodin <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Dmitry Yurichev <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Ivan Bykov <[email protected]>
Subject: Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
Date: Wed, 18 Feb 2026 13:58:03 +0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CACV2tSw3VYS7d27ftO_cs+aF3M54+JwWBbqSGLcKoG9cvyb6EA@mail.gmail.com>
	<[email protected]>
	<CALdSSPhMhNzRRd-SeU0PTwKiGDpFOb5Yss7PWBPN3cHv6kW8eQ@mail.gmail.com>
	<[email protected]>



> On 16 Feb 2026, at 21:01, Heikki Linnakangas <[email protected]> wrote:
> 
> Andrey if you can verify with your TAP test, too, that'd be great.

Here's a hand-wavy test on top of REL_17_STABLE. It modifies binaries to simulate old WAL write behavior.
I tried to hack it with -DDEMO_SIMULATE_OLD_MULTIXACT_BEHAVIOR, but gave up and just hardcoded.
We are not going to commit it, aren't we?

If we comment out this line (patch does it)

        pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
pageno);

the test will pass.

Either way it will hang indefinitely because

2026-02-18 13:44:12.238 +05 [52360] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2026-02-18 13:44:12.250 +05 [52359] FATAL: could not access status of transaction 4096
2026-02-18 13:44:12.250 +05 [52359] DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset 16384: read too few bytes.
2026-02-18 13:44:12.250 +05 [52359] CONTEXT: WAL redo at 0/30245E0 for MultiXact/CREATE_ID: 4095 offset 8189 nmembers 2: 4835 (sh) 4835 (upd)

Most hand-wavy part is test_multixact_write_truncate_wal(): truncation is synthetic.

FWIW, a lot of calculations and commenting done by LLM. Let me know if such a verbosity is not good for readability.


Best regards, Andrey Borodin.


Attachments:

  [application/octet-stream] 0001-Test-Multixact-truncation-near-page-boundary-replay-.patch (13.6K, 2-0001-Test-Multixact-truncation-near-page-boundary-replay-.patch)
  download | inline diff:
From 465eb45cffab0f8503a66288246a0416a0702071 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 18 Feb 2026 10:11:26 +0500
Subject: [PATCH] Test Multixact truncation near page-boundary replay on
 standby

Add a TAP test that reproduces the bug fixed by commit 4a36c89f165:
TRUNCATE_ID WAL replay resets latest_page_number, breaking the
init-next-page check in RecordNewMultiXact.  When a page-crossing
CREATE_ID is replayed after a TRUNCATE_ID whose endTruncOff lands on
a different page, the standby startup process crashes with:

  FATAL: could not access status of transaction ...
  DETAIL: Could not read from file "pg_multixact/offsets/..." read too few bytes.

To trigger the bug reliably in a single-binary test, two additional
changes to multixact.c simulate WAL from older minor versions
(pre-8ba61bc063):

  - ExtendMultiXactOffset(result) instead of (result + 1), so the
    primary does not pre-zero the next page before the CREATE_ID.
  - The "set next multixid's offset" block in RecordNewMultiXact is
    skipped on the primary (!InRecovery) but kept during recovery,
    so the standby still tries to read the next page.

A helper function test_multixact_write_truncate_wal() injects a
TRUNCATE_ID WAL record with a controlled endTruncOff, simulating
the concurrent truncation + multixact creation that occurs in
production.

Apply the fix (0002-Don-t-reset-latest_page_number-when-replaying-
multix.patch) on top of this patch to verify the test passes.
---
 src/backend/access/transam/multixact.c        |  15 ++-
 src/test/modules/test_slru/Makefile           |   4 +-
 src/test/modules/test_slru/meson.build        |   6 +
 .../t/002_multixact_truncation_replay.pl      |  95 ++++++++++++++++
 src/test/modules/test_slru/test_multixact.c   | 105 ++++++++++++++++++
 src/test/modules/test_slru/test_slru--1.0.sql |   7 ++
 6 files changed, 228 insertions(+), 4 deletions(-)
 create mode 100644 src/test/modules/test_slru/t/002_multixact_truncation_replay.pl
 create mode 100644 src/test/modules/test_slru/test_multixact.c

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c863e4e0556..e1fc55d0745 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -996,7 +996,15 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 
 	/*
 	 * Set the next multixid's offset to the end of this multixid's members.
+	 *
+	 * On the primary (!InRecovery), skip this to produce WAL without the next
+	 * offset already set — simulating pre-8ba61bc063 behavior.  During
+	 * recovery, keep this code so the standby tries to read the next page,
+	 * triggering the bug when the init-next-page check fails due to
+	 * truncation resetting latest_page_number.
 	 */
+	if (InRecovery)
+	{
 	if (next_pageno == pageno)
 	{
 		next_offptr = offptr + 1;
@@ -1027,6 +1035,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		*next_offptr = next_offset;
 		MultiXactOffsetCtl->shared->page_dirty[slotno] = true;
 	}
+	}
 
 	/* Release MultiXactOffset SLRU lock. */
 	LWLockRelease(lock);
@@ -1227,7 +1236,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset)
 	 * Make sure there is room for the next MXID in the file.  Assigning this
 	 * MXID sets the next MXID's offset already.
 	 */
-	ExtendMultiXactOffset(result + 1);
+	ExtendMultiXactOffset(result);
 
 	/*
 	 * Reserve the members space, similarly to above.  Also, be careful not to
@@ -3603,8 +3612,8 @@ multixact_redo(XLogReaderState *record)
 		 * SimpleLruTruncate.
 		 */
 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
-							pageno);
+		// pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
+		// 					pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);
 
 		LWLockRelease(MultiXactTruncationLock);
diff --git a/src/test/modules/test_slru/Makefile b/src/test/modules/test_slru/Makefile
index 936886753b7..8870e49da85 100644
--- a/src/test/modules/test_slru/Makefile
+++ b/src/test/modules/test_slru/Makefile
@@ -3,7 +3,8 @@
 MODULE_big = test_slru
 OBJS = \
 	$(WIN32RES) \
-	test_slru.o
+	test_slru.o \
+	test_multixact.o
 PGFILEDESC = "test_slru - test module for SLRUs"
 
 EXTENSION = test_slru
@@ -11,6 +12,7 @@ DATA = test_slru--1.0.sql
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/test_slru/test_slru.conf
 REGRESS = test_slru
+TAP_TESTS = 1
 # Disabled because these tests require "shared_preload_libraries=test_slru",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/test_slru/meson.build b/src/test/modules/test_slru/meson.build
index ce91e606313..f589b3ec358 100644
--- a/src/test/modules/test_slru/meson.build
+++ b/src/test/modules/test_slru/meson.build
@@ -2,6 +2,7 @@
 
 test_slru_sources = files(
   'test_slru.c',
+  'test_multixact.c',
 )
 
 if host_system == 'windows'
@@ -32,4 +33,9 @@ tests += {
     'regress_args': ['--temp-config', files('test_slru.conf')],
     'runningcheck': false,
   },
+  'tap': {
+    'tests': [
+      't/002_multixact_truncation_replay.pl',
+    ],
+  },
 }
diff --git a/src/test/modules/test_slru/t/002_multixact_truncation_replay.pl b/src/test/modules/test_slru/t/002_multixact_truncation_replay.pl
new file mode 100644
index 00000000000..4a4140e8bd2
--- /dev/null
+++ b/src/test/modules/test_slru/t/002_multixact_truncation_replay.pl
@@ -0,0 +1,95 @@
+# Copyright (c) 2024-2026, PostgreSQL Global Development Group
+
+# Test multixact SLRU truncation replay on standby.
+#
+# Reproduces the bug fixed by commit 4a36c89f165: during TRUNCATE_ID replay,
+# latest_page_number was reset to MultiXactIdToOffsetPage(endTruncOff).  This
+# broke the init-next-page check in RecordNewMultiXact, which compares
+# latest_page_number == pageno.  If a CREATE_ID that crosses a page boundary
+# is replayed AFTER a TRUNCATE_ID whose endTruncOff is on a different page,
+# the init check doesn't fire, the next page isn't initialized, and
+# SimpleLruReadPage fails with FATAL.
+#
+# The test uses test_multixact_write_truncate_wal() to inject a TRUNCATE_ID
+# WAL record with endTruncOff on page 0, placed between two batches of
+# CREATE_IDs.  This simulates the real-world scenario where truncation runs
+# concurrently with multixact creation.
+#
+# To produce WAL without a pre-zeroed next page (as older minor versions did
+# before 8ba61bc063), two changes in multixact.c are required:
+#   - ExtendMultiXactOffset(result) instead of (result + 1)
+#   - next-offset write in RecordNewMultiXact skipped on primary
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+# MULTIXACT_OFFSETS_PER_PAGE = BLCKSZ/4 = 2048 (for 8kB blocks).
+#
+# Scenario:
+#   1. Create 2046 multixacts (multis 1..2046).  nextMXact = 2047, page 0.
+#   2. Take backup.
+#   3. Create 2048 MORE multixacts (2047..4094).  Multi 2047 crosses page 0->1.
+#   4. Inject TRUNCATE_ID with endTruncOff = 10 (page 0).
+#   5. Create 1 more multixact (4095), last entry on page 1, crossing to page 2.
+#
+# On the standby:
+#   - StartupMultiXact sets latest_page_number = page(2047) = 0
+#   - CREATE_ID(2047) crosses page 0->1: init check fires (0==0), zeros page 1,
+#     latest_page_number updated to 1 by SimpleLruZeroPage
+#   - CREATE_IDs for 2048..4094 on page 1 (no crossings)
+#   - TRUNCATE_ID(endTruncOff=10): latest_page_number reset to page(10) = 0
+#   - CREATE_ID(4095): pageno=1, next_pageno=2
+#     Init check: latest_page_number(0) != pageno(1) -> SKIP
+#     RecordNewMultiXact tries SimpleLruReadPage(page 2) -> FATAL
+#
+# With the fix (not resetting latest_page_number in TRUNCATE_ID replay):
+#   - latest_page_number stays 1 after the page 0->1 crossing
+#   - Init check at CREATE_ID(4095): latest_page_number(1) == pageno(1) -> fires
+#   - Page 2 is initialized -> replay succeeds
+
+my $node_primary = PostgreSQL::Test::Cluster->new('main');
+$node_primary->init(allows_streaming => 'physical');
+$node_primary->append_conf('postgresql.conf',
+	"shared_preload_libraries = 'test_slru'");
+$node_primary->start;
+$node_primary->safe_psql('postgres', q(CREATE EXTENSION test_slru));
+
+# Fill page 0: multis 1..2046, nextMXact = 2047
+$node_primary->safe_psql('postgres', q{SELECT test_create_multixacts(2046)});
+
+$node_primary->backup('mx_backup');
+
+# Fill page 1: multis 2047..4094, nextMXact = 4095.
+# Multi 2047 crosses page 0->1; on the standby the init check zeros page 1.
+$node_primary->safe_psql('postgres', q{SELECT test_create_multixacts(2048)});
+
+# Inject TRUNCATE_ID with endTruncOff on page 0.
+# On the standby this resets latest_page_number from 1 back to 0.
+$node_primary->safe_psql('postgres',
+	q{SELECT test_multixact_write_truncate_wal('10'::xid)});
+
+# Create multi 4095 (page 1, entry 2047) which crosses to page 2.
+# Without the fix the standby crashes here: latest_page_number(0) != pageno(1).
+$node_primary->safe_psql('postgres', q{SELECT test_create_multixact()});
+$node_primary->safe_psql('postgres', q{SELECT pg_switch_wal()});
+
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary, 'mx_backup',
+	has_streaming => 1);
+$node_standby->start;
+
+my $primary_lsn = $node_primary->lsn('flush');
+my $replayed = $node_standby->poll_query_until('postgres',
+	qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()});
+
+ok($replayed, "standby replayed TRUNCATE_ID + page-crossing CREATE_ID");
+
+$node_standby->stop if $replayed;
+$node_primary->stop;
+
+done_testing();
diff --git a/src/test/modules/test_slru/test_multixact.c b/src/test/modules/test_slru/test_multixact.c
new file mode 100644
index 00000000000..e2f6f6a738d
--- /dev/null
+++ b/src/test/modules/test_slru/test_multixact.c
@@ -0,0 +1,105 @@
+/*-------------------------------------------------------------------------
+ *
+ * test_multixact.c
+ *		Support code for multixact testing
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_slru/test_multixact.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/multixact.h"
+#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "utils/pg_lsn.h"
+
+PG_FUNCTION_INFO_V1(test_create_multixact);
+PG_FUNCTION_INFO_V1(test_create_multixacts);
+PG_FUNCTION_INFO_V1(test_multixact_write_truncate_wal);
+
+/*
+ * Produces multixact with 2 current xids
+ */
+Datum
+test_create_multixact(PG_FUNCTION_ARGS)
+{
+	MultiXactId id;
+
+	MultiXactIdSetOldestMember();
+	id = MultiXactIdCreate(GetCurrentTransactionId(), MultiXactStatusUpdate,
+						   GetCurrentTransactionId(), MultiXactStatusForShare);
+	PG_RETURN_TRANSACTIONID(id);
+}
+
+/*
+ * Create n multixacts.  Used to quickly fill offset pages for truncation tests.
+ *
+ * Each iteration uses a subtransaction so that GetCurrentTransactionId()
+ * returns a different xid, preventing mXactCacheGetBySet from returning a
+ * cached result and ensuring a new MultiXactId is allocated every time.
+ */
+Datum
+test_create_multixacts(PG_FUNCTION_ARGS)
+{
+	int32		n = PG_GETARG_INT32(0);
+	MultiXactId first_id = InvalidMultiXactId;
+
+	if (n <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("n must be positive")));
+
+	for (int i = 0; i < n; i++)
+	{
+		MultiXactId id;
+
+		BeginInternalSubTransaction(NULL);
+		MultiXactIdSetOldestMember();
+		id = MultiXactIdCreate(GetCurrentTransactionId(), MultiXactStatusUpdate,
+							   GetCurrentTransactionId(), MultiXactStatusForShare);
+		ReleaseCurrentSubTransaction();
+
+		if (i == 0)
+			first_id = id;
+	}
+
+	PG_RETURN_TRANSACTIONID(first_id);
+}
+
+/*
+ * Write a TRUNCATE_ID WAL record with the given endTruncOff.
+ *
+ * This is used to simulate a truncation that sets latest_page_number to a
+ * specific page during standby replay, without actually truncating anything
+ * on the primary.  The standby's multixact_redo handler will reset
+ * latest_page_number = MultiXactIdToOffsetPage(endTruncOff).
+ */
+Datum
+test_multixact_write_truncate_wal(PG_FUNCTION_ARGS)
+{
+	MultiXactId endTruncOff = PG_GETARG_TRANSACTIONID(0);
+	xl_multixact_truncate xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.oldestMultiDB = MyDatabaseId;
+	xlrec.startTruncOff = 1;
+	xlrec.endTruncOff = endTruncOff;
+	xlrec.startTruncMemb = 0;
+	xlrec.endTruncMemb = 0;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, SizeOfMultiXactTruncate);
+	recptr = XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_TRUNCATE_ID);
+	XLogFlush(recptr);
+
+	PG_RETURN_LSN(recptr);
+}
diff --git a/src/test/modules/test_slru/test_slru--1.0.sql b/src/test/modules/test_slru/test_slru--1.0.sql
index 202e8da3fde..0d6271473bf 100644
--- a/src/test/modules/test_slru/test_slru--1.0.sql
+++ b/src/test/modules/test_slru/test_slru--1.0.sql
@@ -19,3 +19,10 @@ CREATE OR REPLACE FUNCTION test_slru_page_truncate(bigint) RETURNS VOID
   AS 'MODULE_PATHNAME', 'test_slru_page_truncate' LANGUAGE C;
 CREATE OR REPLACE FUNCTION test_slru_delete_all() RETURNS VOID
   AS 'MODULE_PATHNAME', 'test_slru_delete_all' LANGUAGE C;
+
+CREATE OR REPLACE FUNCTION test_create_multixact() RETURNS xid
+  AS 'MODULE_PATHNAME', 'test_create_multixact' LANGUAGE C;
+CREATE OR REPLACE FUNCTION test_create_multixacts(int) RETURNS xid
+  AS 'MODULE_PATHNAME', 'test_create_multixacts' LANGUAGE C;
+CREATE OR REPLACE FUNCTION test_multixact_write_truncate_wal(xid) RETURNS pg_lsn
+  AS 'MODULE_PATHNAME', 'test_multixact_write_truncate_wal' LANGUAGE C;
-- 
2.51.2



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox