17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"

public inbox for [email protected]  
help / color / mirror / Atom feed

17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
5+ messages / 3 participants
[nested] [flat]

* 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
@ 2026-02-13 20:31 Sebastian Webber <[email protected]>
  2026-02-14 11:42 ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Heikki Linnakangas <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Sebastian Webber @ 2026-02-13 20:31 UTC (permalink / raw)
  To: [email protected]

PostgreSQL version: 17.8 (standby), 17.5 (primary)

Primary: PostgreSQL 17.5 (Debian 17.5-1.pgdg130+1) on
aarch64-unknown-linux-gnu
Standby: PostgreSQL 17.8 (Debian 17.8-1.pgdg13+1) on
aarch64-unknown-linux-gnu

Platform: Docker containers on macOS (Apple Silicon / aarch64), Docker
Desktop

Description
-----------

A PostgreSQL 17.8 standby crashes during WAL replay when streaming
from a 17.5 primary. The crash occurs after replaying a
MultiXact/TRUNCATE_ID record followed by a MultiXact/CREATE_ID
record.

Steps to reproduce
------------------

1. Start a 17.5 primary configured for streaming replication
2. Seed a database with ~2GB of data (tables with foreign key
   constraints)
3. Start a 17.5 standby via pg_basebackup, confirm streaming
   replication
4. Generate ~500K MultiXact IDs using concurrent SELECT ... FOR SHARE
   / FOR KEY SHARE on the same rows
5. Run VACUUM on the multixact-heavy tables (generates TRUNCATE_ID
   WAL records)
6. Stop the 17.5 standby
7. Continue generating ~2M additional MultiXact IDs on the primary
   (builds WAL backlog)
8. Start a 17.8 standby on the same data volume -- it begins
   replaying the WAL backlog
9. Standby crashes during replay

An automated reproducer (Go program + shell scripts) is available at:
https://gist.github.com/sebastianwebber/2cd25d298bfe85cabcd8d41f83591acb

It requires Go 1.22+ and Docker. Typical runtime is ~10 minutes.

  go run main.go --cleanup

Actual output (standby log)
----------------------------

The standby successfully replays multiple SLRU page boundaries with
this pattern:

  DEBUG:  next offsets page is not initialized, initializing it now
  CONTEXT:  WAL redo at 3/28C148D8 for MultiXact/CREATE_ID: 856063 offset
6680130 nmembers 9: ...
  DEBUG:  skipping initialization of offsets page 418 because it was
already initialized on multixid creation
  CONTEXT:  WAL redo at 3/28C149B8 for MultiXact/ZERO_OFF_PAGE: 418

This repeats for pages 408 through 418. Then a truncation occurs:

  DEBUG:  replaying multixact truncation: offsets [1, 490986), offsets
segments [0, 7), members [1, 3864017), members segments [0, 49)
  CONTEXT:  WAL redo at 3/29D6D548 for MultiXact/TRUNCATE_ID: offsets [1,
490986), members [1, 3864017)

The very next CREATE_ID crashes:

  FATAL:  could not access status of transaction 858112
  DETAIL:  Could not read from file "pg_multixact/offsets/000D" at offset
24576: read too few bytes.
  CONTEXT:  WAL redo at 3/2A3AB408 for MultiXact/CREATE_ID: 858111 offset
6695072 nmembers 5: 1048228 (sh) 1048271 (keysh) 1048316 (sh) 1048344
(keysh) 1048370 (sh)

  LOG:  startup process (PID 29) exited with exit code 1
  LOG:  shutting down due to startup process failure

Expected output
---------------

The standby should successfully replay all WAL records and reach a
consistent streaming state.

Configuration (non-default on primary)
--------------------------------------

  wal_level = replica
  max_wal_senders = 10
  max_connections = 1200
  shared_buffers = 256MB
  wal_keep_size = 16GB
  autovacuum_multixact_freeze_max_age = 100000
  vacuum_multixact_freeze_min_age = 1000
  vacuum_multixact_freeze_table_age = 50000

Standby configured with log_min_messages = debug1.

-- 
Sebastian Webber

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
  2026-02-13 20:31 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Sebastian Webber <[email protected]>
@ 2026-02-14 11:42 ` Heikki Linnakangas <[email protected]>
  2026-02-14 16:18   ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Heikki Linnakangas @ 2026-02-14 11:42 UTC (permalink / raw)
  To: Sebastian Webber <[email protected]>; [email protected]; +Cc: Andrey Borodin <[email protected]>; Álvaro Herrera <[email protected]>; Dmitry Yurichev <[email protected]>; Chao Li <[email protected]>; Ivan Bykov <[email protected]>; Kirill Reshke <[email protected]>

On 13/02/2026 22:31, Sebastian Webber wrote:
> PostgreSQL version: 17.8 (standby), 17.5 (primary)
> 
> Primary: PostgreSQL 17.5 (Debian 17.5-1.pgdg130+1) on aarch64-unknown- 
> linux-gnu
> Standby: PostgreSQL 17.8 (Debian 17.8-1.pgdg13+1) on aarch64-unknown- 
> linux-gnu
> 
> Platform: Docker containers on macOS (Apple Silicon / aarch64), Docker 
> Desktop
> 
> 
> Description
> -----------
> 
> A PostgreSQL 17.8 standby crashes during WAL replay when streaming
> from a 17.5 primary. The crash occurs after replaying a
> MultiXact/TRUNCATE_ID record followed by a MultiXact/CREATE_ID
> record.

Thanks for the report, I can repro it with your script. It is indeed a 
regression introduced in the latest minor release, in the logic to 
replay multixact WAL generated on older minor versions. (Commit 
8ba61bc063). Adding the folks from the thread that led to that commit.

The commit added this in RecordNewMultiXact():

> 	/*
> 	 * Older minor versions didn't set the next multixid's offset in this
> 	 * function, and therefore didn't initialize the next page until the next
> 	 * multixid was assigned.  If we're replaying WAL that was generated by
> 	 * such a version, the next page might not be initialized yet.  Initialize
> 	 * it now.
> 	 */
> 	if (InRecovery &&
> 		next_pageno != pageno &&
> 		pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) == pageno)
> 	{
> 		elog(DEBUG1, "next offsets page is not initialized, initializing it now");

The idea is that if the next offset falls on a different page 
(next_pageno != pageno), and we have not yet initialized the next page 
(pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number) == 
pageno), we initialize it now. However, that last check goes wrong after 
a truncation record is replayed. Replaying a truncation record does this:

> 
> 		/*
> 		 * During XLOG replay, latest_page_number isn't necessarily set up
> 		 * yet; insert a suitable value to bypass the sanity test in
> 		 * SimpleLruTruncate.
> 		 */
> 		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
> 		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
> 							pageno);
Thanks to that, latest_page_number moves backwards to much older page 
number. That breaks the "was the next offset page already initialized?" 
test in RecordNewMultiXact().

I don't understand why that "bypass the sanity check" is needed. As far 
as I can see, latest_page_number is tracked accurately during WAL 
replay, and should already be set up. It's initialized in 
StartupMultiXact(), and updated whenever the next page is initialized.

That was introduced a long time ago, in commit 4f627f8973, which in turn 
was a backpatched and had deal with WAL that was generated before that 
commit. I suspect it was necessary back then, for backwards 
compatiblity, but isn't necessary any more. Hence, I propose to remove 
that "bypass the sanity check" code (attached). Does anyone see a 
scenario where latest_page_number might not be set correctly?

If we want to play it even more safe -- and I guess that's the right 
thing to do for backpatching -- we could set latest_page_number 
*temporarily* while we do the the truncation, and restore the old value 
afterwards.

This fixes the bug. With this fix, you can replay WAL that's already 
been generated.

- Heikki

Attachments:

  [text/x-patch] 0001-Don-t-reset-latest_page_number-when-replaying-multix.patch (1.9K, 2-0001-Don-t-reset-latest_page_number-when-replaying-multix.patch)
  download | inline diff:
From 59556e5b24f7973b857e54e6fcd136d401c9ff0f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Sat, 14 Feb 2026 13:30:03 +0200
Subject: [PATCH 1/1] Don't reset 'latest_page_number' when replaying multixid
 truncation

'latest_page_number' is set to the correct value, according to
nextOffset, early at system startup. Contrary to the comment, it hence
should be set up correctly by the time we get to WAL replay.

This fixes a failure to replay WAL generated on older minor versions,
before commit 789d65364c (18.2, 17.8, 16.12, 15.16, 14.21).

Discussion: https://www.postgresql.org/message-id/[email protected];lightning.p46.dedyn.io
---
 src/backend/access/transam/multixact.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index c863e4e0556..e45ec0d7247 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3571,7 +3571,6 @@ multixact_redo(XLogReaderState *record)
 	else if (info == XLOG_MULTIXACT_TRUNCATE_ID)
 	{
 		xl_multixact_truncate xlrec;
-		int64		pageno;

 		memcpy(&xlrec, XLogRecGetData(record),
 			   SizeOfMultiXactTruncate);
@@ -3596,15 +3595,6 @@ multixact_redo(XLogReaderState *record)
 		SetMultiXactIdLimit(xlrec.endTruncOff, xlrec.oldestMultiDB, false);

 		PerformMembersTruncation(xlrec.startTruncMemb, xlrec.endTruncMemb);
-
-		/*
-		 * During XLOG replay, latest_page_number isn't necessarily set up
-		 * yet; insert a suitable value to bypass the sanity test in
-		 * SimpleLruTruncate.
-		 */
-		pageno = MultiXactIdToOffsetPage(xlrec.endTruncOff);
-		pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
-							pageno);
 		PerformOffsetsTruncation(xlrec.startTruncOff, xlrec.endTruncOff);

 		LWLockRelease(MultiXactTruncationLock);
-- 
2.47.3

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
  2026-02-13 20:31 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Sebastian Webber <[email protected]>
  2026-02-14 11:42 ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Heikki Linnakangas <[email protected]>
@ 2026-02-14 16:18   ` Andrey Borodin <[email protected]>
  2026-02-14 17:41     ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Andrey Borodin @ 2026-02-14 16:18 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Sebastian Webber <[email protected]>; [email protected]; Andrey Borodin <[email protected]>; Álvaro Herrera <[email protected]>; Dmitry Yurichev <[email protected]>; Chao Li <[email protected]>; Ivan Bykov <[email protected]>; Kirill Reshke <[email protected]>

Ouch...

I remember this place. For some reason I thought endTruncOff is the end of offsets. That would make sense here... Now I see it's just a new oldest offset.

> On 14 Feb 2026, at 16:42, Heikki Linnakangas <[email protected]> wrote:
> 
> If we want to play it even more safe -- and I guess that's the right thing to do for backpatching -- we could set latest_page_number *temporarily* while we do the the truncation, and restore the old value afterwards.

As far as I can see, the only relevant usage of last_page_number is:
/*
 * While we are holding the lock, make an important safety check: the
 * current endpoint page must not be eligible for removal.
 */
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
	LWLockRelease(shared->ControlLock);
	ereport(LOG,
		(errmsg("could not truncate directory \"%s\": apparent wraparound",
		ctl->Dir)));
	return;
}


Perhaps, we also can bump latest_page_number forward?


Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
  2026-02-13 20:31 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Sebastian Webber <[email protected]>
  2026-02-14 11:42 ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Heikki Linnakangas <[email protected]>
  2026-02-14 16:18   ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
@ 2026-02-14 17:41     ` Andrey Borodin <[email protected]>
  2026-02-15 18:11       ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Andrey Borodin @ 2026-02-14 17:41 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Sebastian Webber <[email protected]>; [email protected]; Andrey Borodin <[email protected]>; Álvaro Herrera <[email protected]>; Dmitry Yurichev <[email protected]>; Chao Li <[email protected]>; Ivan Bykov <[email protected]>; Kirill Reshke <[email protected]>



> On 14 Feb 2026, at 21:18, Andrey Borodin <[email protected]> wrote:
> 
> Perhaps, we also can bump latest_page_number forward?
This is not a good idea, we don't want "most accurate latest_page_number", we need precise number at any point after StartupMultiXact().

Wiping write by XLOG_MULTIXACT_TRUNCATE_ID seems correct to me everywhere 14-18.

I'd also suggest updating comment:

* this is not critical data, since we use it only to avoid swapping out
* the latest page.

It's absolutely critical now.


Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
  2026-02-13 20:31 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Sebastian Webber <[email protected]>
  2026-02-14 11:42 ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Heikki Linnakangas <[email protected]>
  2026-02-14 16:18   ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
  2026-02-14 17:41     ` Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Andrey Borodin <[email protected]>
@ 2026-02-15 18:11       ` Andrey Borodin <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Andrey Borodin @ 2026-02-15 18:11 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Sebastian Webber <[email protected]>; [email protected]; Andrey Borodin <[email protected]>; Álvaro Herrera <[email protected]>; Dmitry Yurichev <[email protected]>; Chao Li <[email protected]>; Ivan Bykov <[email protected]>; Kirill Reshke <[email protected]>



> On 14 Feb 2026, at 22:41, Andrey Borodin <[email protected]> wrote:
> 
> Wiping write by XLOG_MULTIXACT_TRUNCATE_ID seems correct to me everywhere 14-18.

FWIW I've tried to create a TAP-reproducer, but it's tricky in controlled environment.
But I've created a TAP that triggers near-wraparound truncation:

2026-02-15 23:05:57.716 +05 [73950] DEBUG: replaying multixact truncation: offsets [1, 2147483648), offsets segments [0, 8000), members [1, 3), members segments [0, 0)
2026-02-15 23:05:57.716 +05 [73950] CONTEXT: WAL redo at 0/309CD70 for MultiXact/TRUNCATE_ID: offsets [1, 2147483648), members [1, 3)
2026-02-15 23:05:57.716 +05 [73950] DEBUG: MultiXactId wrap limit is 4294967295, limited by database with OID 1
2026-02-15 23:05:57.716 +05 [73950] CONTEXT: WAL redo at 0/309CD70 for MultiXact/TRUNCATE_ID: offsets [1, 2147483648), members [1, 3)
2026-02-15 23:05:57.716 +05 [73950] LOG: file "pg_multixact/offsets/8000" doesn't exist, reading as zeroes


And I observe no problems with applied "0001-Don-t-reset-latest_page_number-when-replaying-multix.patch"


Best regards, Andrey Borodin.






Attachments:

  [application/octet-stream] 0001-Test-Multixact-truncation-near-araparound.patch (13.5K, 2-0001-Test-Multixact-truncation-near-araparound.patch)
  download | inline diff:
From 0f6d9f7daf449ec2c1efdfacfffc4701560d4be5 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Sun, 15 Feb 2026 21:16:38 +0500
Subject: [PATCH 1/2] Test Multixact truncation near araparound

---
 src/bin/pg_resetwal/pg_resetwal.c             |  20 ++-
 src/test/modules/test_slru/Makefile           |   4 +-
 .../test_slru/t/002_multixact_wraparound.pl   | 165 ++++++++++++++++++
 src/test/modules/test_slru/test_multixact.c   |  54 ++++++
 src/test/modules/test_slru/test_slru--1.0.sql |   5 +
 5 files changed, 245 insertions(+), 3 deletions(-)
 create mode 100644 src/test/modules/test_slru/t/002_multixact_wraparound.pl
 create mode 100644 src/test/modules/test_slru/test_multixact.c

diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 6cb2bf47568..5f2b07c3038 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -73,6 +73,7 @@ static bool mxid_given = false;
 static MultiXactId set_mxid = 0;
 static bool mxoff_given = false;
 static MultiXactOffset set_mxoff = 0;
+static bool wal_level_replica = false;
 static TimeLineID minXlogTli = 0;
 static XLogSegNo minXlogSegNo = 0;
 static int	WalSegSz;
@@ -108,6 +109,7 @@ main(int argc, char *argv[])
 		{"oldest-transaction-id", required_argument, NULL, 'u'},
 		{"next-transaction-id", required_argument, NULL, 'x'},
 		{"wal-segsize", required_argument, NULL, 1},
+		{"wal-level", required_argument, NULL, 3},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -307,6 +309,19 @@ main(int argc, char *argv[])
 					break;
 				}
 
+			case 3:
+				if (pg_strcasecmp(optarg, "replica") == 0)
+					wal_level_replica = true;
+				else if (pg_strcasecmp(optarg, "minimal") == 0)
+					wal_level_replica = false;
+				else
+				{
+					pg_log_error("invalid argument for option %s", "--wal-level");
+					pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+					exit(1);
+				}
+				break;
+
 			default:
 				/* getopt_long already emitted a complaint */
 				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -683,7 +698,7 @@ GuessControlValues(void)
 
 	/* minRecoveryPoint, backupStartPoint and backupEndPoint can be left zero */
 
-	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
+	ControlFile.wal_level = wal_level_replica ? WAL_LEVEL_REPLICA : WAL_LEVEL_MINIMAL;
 	ControlFile.wal_log_hints = false;
 	ControlFile.track_commit_timestamp = false;
 	ControlFile.MaxConnections = 100;
@@ -890,7 +905,7 @@ RewriteControlFile(void)
 	 * as long as wal_level='minimal'; the postmaster will reset these fields
 	 * anyway at startup.
 	 */
-	ControlFile.wal_level = WAL_LEVEL_MINIMAL;
+	ControlFile.wal_level = wal_level_replica ? WAL_LEVEL_REPLICA : WAL_LEVEL_MINIMAL;
 	ControlFile.wal_log_hints = false;
 	ControlFile.track_commit_timestamp = false;
 	ControlFile.MaxConnections = 100;
@@ -1202,6 +1217,7 @@ usage(void)
 	printf(_("  -O, --multixact-offset=OFFSET    set next multitransaction offset\n"));
 	printf(_("  -u, --oldest-transaction-id=XID  set oldest transaction ID\n"));
 	printf(_("  -x, --next-transaction-id=XID    set next transaction ID\n"));
+	printf(_("      --wal-level=LEVEL            set checkpoint wal_level to \"minimal\" or \"replica\"\n"));
 	printf(_("      --wal-segsize=SIZE           size of WAL segments, in megabytes\n"));
 
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/test/modules/test_slru/Makefile b/src/test/modules/test_slru/Makefile
index 936886753b7..8870e49da85 100644
--- a/src/test/modules/test_slru/Makefile
+++ b/src/test/modules/test_slru/Makefile
@@ -3,7 +3,8 @@
 MODULE_big = test_slru
 OBJS = \
 	$(WIN32RES) \
-	test_slru.o
+	test_slru.o \
+	test_multixact.o
 PGFILEDESC = "test_slru - test module for SLRUs"
 
 EXTENSION = test_slru
@@ -11,6 +12,7 @@ DATA = test_slru--1.0.sql
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/test_slru/test_slru.conf
 REGRESS = test_slru
+TAP_TESTS = 1
 # Disabled because these tests require "shared_preload_libraries=test_slru",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/src/test/modules/test_slru/t/002_multixact_wraparound.pl b/src/test/modules/test_slru/t/002_multixact_wraparound.pl
new file mode 100644
index 00000000000..15db0656772
--- /dev/null
+++ b/src/test/modules/test_slru/t/002_multixact_wraparound.pl
@@ -0,0 +1,165 @@
+# Copyright (c) 2024-2026, PostgreSQL Global Development Group
+
+# Test multixact SLRU truncation near wraparound with standby replay.
+# Creates an old multixact (mx 1) on heap, then pg_resetwal to advance next
+# multixact near wraparound.  VACUUM triggers truncation (TRUNCATE_ID WAL).
+# Standby must replay TRUNCATE_ID followed by CREATE_ID without crashing
+# (fixes bug where latest_page_number was incorrectly reset during truncation
+# replay).
+#
+# Uses backup_fs_cold + archive recovery (PITR) because the cold copy preserves
+# mx 1 (no autovacuum truncation before backup), but its checkpoint has
+# wal_level=minimal so streaming is impossible.  Archive all WAL and replay.
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node_primary = PostgreSQL::Test::Cluster->new('main');
+$node_primary->init(
+	has_archiving => 1,
+	allows_streaming => 'physical',
+	auth_extra => [ '--create-role' => 'repl_role' ]);
+$node_primary->append_conf('postgresql.conf',
+	"shared_preload_libraries = 'test_slru'");
+$node_primary->append_conf('postgresql.conf', qq[
+vacuum_multixact_freeze_min_age = 0
+vacuum_multixact_freeze_table_age = 0
+log_min_messages = debug1
+]);
+
+my $node_pgdata = $node_primary->data_dir;
+
+# Create old multixact (mx 1) on heap before pg_resetwal
+$node_primary->start;
+$node_primary->safe_psql('postgres', q(CREATE EXTENSION test_slru));
+$node_primary->safe_psql('postgres', q{
+	CREATE TABLE mx_trunc_tab (id int);
+	INSERT INTO mx_trunc_tab VALUES (1);
+});
+# FOR SHARE creates multixact 1 on heap xmax
+$node_primary->safe_psql('postgres', q{
+	BEGIN;
+	SELECT * FROM mx_trunc_tab FOR SHARE;
+	COMMIT;
+});
+# Create multixact 2 so mx 1's "next offset" is set (needed for GetMultiXactIdMembers)
+$node_primary->safe_psql('postgres', q{SELECT test_create_multixact();});
+
+$node_primary->stop;
+
+# Advance next multixact near wraparound; keep oldest=1 so mx 1 stays valid
+my $next_mx = 2**31;        # 2147483648, at wraparound boundary
+command_ok(
+	[
+		'pg_resetwal',
+		'--wal-level' => 'replica',
+		'--multixact-ids' => sprintf('%u,1', $next_mx),
+		$node_pgdata
+	],
+	"set next multixact to $next_mx (near wraparound), oldest to 1");
+
+# Extract values from pg_resetwal --dry-run for SLRU fixup
+my $out = (run_command([ 'pg_resetwal', '--dry-run', $node_primary->data_dir ]))[0];
+$out =~ /^Database block size: *(\d+)$/m or die "pg_resetwal output missing Database block size";
+my $blcksz = $1;
+# SLRU_PAGES_PER_SEGMENT is a compile-time constant (32) in slru.h; pg_resetwal doesn't output it
+my $slru_pages_per_segment = 32;
+
+# Create segment for next multixact; segment 0 with mx 1 stays for truncation
+my $multixact_offsets_per_page = $blcksz / 8;
+my $segno =
+  int($next_mx / $multixact_offsets_per_page / $slru_pages_per_segment);
+my $slru_file = sprintf('%s/pg_multixact/offsets/%04X', $node_pgdata, $segno);
+open my $fh, ">", $slru_file
+  or die "could not open \"$slru_file\": $!";
+binmode $fh;
+my $bytes_per_seg = $slru_pages_per_segment * $blcksz;
+syswrite($fh, "\0" x $bytes_per_seg) == $bytes_per_seg
+  or die "could not write to \"$slru_file\": $!";
+close $fh;
+
+# Cold copy preserves pg_resetwal state (mx 1 intact); checkpoint has
+# wal_level=minimal so use archive recovery instead of streaming.
+$node_primary->backup_fs_cold('mx_backup');
+
+# Start primary; it will archive WAL with wal_level=replica from config
+$node_primary->start;
+
+# mx 1 must be readable before truncation (segment 0 still exists)
+is( $node_primary->safe_psql('postgres', q{SELECT test_read_multixact('1');}),
+	'',
+	"multixact 1 readable before truncation");
+
+# Advance all databases' datminmxid so system-wide minimum allows truncation.
+# template0 has datallowconn=false by default; allow connections so vacuumdb
+# --all includes it (vacuumdb skips databases with datallowconn=false).
+# vacuum_multixact_freeze_min_age=0 makes MultiXactCutoff=nextMXID (2^31)
+# which exists (from template0 FOR SHARE), so truncation can succeed.
+$node_primary->safe_psql('postgres', q{ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true});
+$node_primary->safe_psql('template0',
+	q{SELECT * FROM pg_catalog.pg_class LIMIT 1 FOR SHARE});
+# Vacuum all databases (including template0) so every relation's relminmxid advances past 1
+$node_primary->command_ok([ 'vacuumdb', '--all', '--freeze', '--port', $node_primary->port ],
+	'vacuumdb --all --freeze');
+$node_primary->safe_psql('postgres', q{ALTER DATABASE template0 WITH ALLOW_CONNECTIONS false});
+
+# CREATE_ID WAL records must follow TRUNCATE_ID - stresses latest_page_number fix.
+# Create enough multixacts to cross an offset page boundary (next-page bug): the
+# first multixact (2^31) is at entry 0 of its page, so we need offsets_per_page
+# more to fill the page, and one more to trigger allocation of the next page.
+my $multixacts_to_next_page = $multixact_offsets_per_page + 1;
+foreach my $i (1 .. $multixacts_to_next_page)
+{
+	$node_primary->safe_psql('postgres', q{SELECT test_create_multixact();});
+}
+
+# Force archive so standby can replay
+$node_primary->safe_psql('postgres', q{SELECT pg_switch_wal()});
+
+# Standby from cold backup, replay via archive (no streaming)
+my $node_standby = PostgreSQL::Test::Cluster->new('standby');
+$node_standby->init_from_backup($node_primary, 'mx_backup',
+	has_restoring => 1,
+	has_streaming => 0,
+	standby => 1);
+$node_standby->append_conf('postgresql.conf',
+	"log_min_messages = debug1\nwal_retrieve_retry_interval = '100ms'\nmax_connections = 100");
+$node_standby->start;
+
+my $primary_lsn = $node_primary->lsn('flush');
+$node_standby->poll_query_until('postgres',
+	qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
+	or die "Timed out waiting for standby to replay";
+
+# Standby must replay truncation (from archive)
+my $standby_log = $node_standby->log_content();
+ok( $standby_log =~ /replaying multixact truncation/,
+	"standby replayed multixact TRUNCATE_ID (truncation near wraparound)");
+
+# Multixact that crossed offset page boundary must be readable (next-page bug)
+my $multi_at_page_boundary = $next_mx + $multixact_offsets_per_page;
+is( $node_standby->safe_psql('postgres', qq{SELECT test_read_multixact('$multi_at_page_boundary');}),
+	'',
+	"multixact at offset page boundary readable on standby (next-page replay)");
+
+# New multixacts must be readable on standby
+my $first_new_multi = $node_primary->safe_psql('postgres',
+	q{SELECT test_create_multixact();});
+$node_primary->safe_psql('postgres', q{SELECT pg_switch_wal()});
+my $final_lsn = $node_primary->lsn('flush');
+$node_standby->poll_query_until('postgres',
+	qq{SELECT '$final_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
+	or die "Timed out waiting for standby to replay";
+is( $node_standby->safe_psql('postgres', qq{SELECT test_read_multixact('$first_new_multi');}),
+	'',
+	"new multixact readable on standby after truncation replay");
+
+$node_standby->stop;
+$node_primary->stop;
+
+done_testing();
diff --git a/src/test/modules/test_slru/test_multixact.c b/src/test/modules/test_slru/test_multixact.c
new file mode 100644
index 00000000000..7b961668116
--- /dev/null
+++ b/src/test/modules/test_slru/test_multixact.c
@@ -0,0 +1,54 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_multixact.c
+ *		Support code for multixact testing
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *		src/test/modules/test_slru/test_multixact.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/multixact.h"
+#include "access/xact.h"
+#include "fmgr.h"
+
+PG_FUNCTION_INFO_V1(test_create_multixact);
+PG_FUNCTION_INFO_V1(test_read_multixact);
+
+/*
+ * Produces multixact with 2 current xids
+ */
+Datum
+test_create_multixact(PG_FUNCTION_ARGS)
+{
+	MultiXactId id;
+
+	MultiXactIdSetOldestMember();
+	id = MultiXactIdCreate(GetCurrentTransactionId(), MultiXactStatusUpdate,
+						   GetCurrentTransactionId(), MultiXactStatusForShare);
+	PG_RETURN_TRANSACTIONID(id);
+}
+
+/*
+ * Reads given multixact.  Discards local cache to make a real read.
+ */
+Datum
+test_read_multixact(PG_FUNCTION_ARGS)
+{
+	MultiXactId id = PG_GETARG_TRANSACTIONID(0);
+	MultiXactMember *members;
+
+	/* discard caches */
+	AtEOXact_MultiXact();
+
+	if (GetMultiXactIdMembers(id, &members, false, false) == -1)
+		elog(ERROR, "MultiXactId not found");
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/test_slru/test_slru--1.0.sql b/src/test/modules/test_slru/test_slru--1.0.sql
index 202e8da3fde..a7c6c458a5e 100644
--- a/src/test/modules/test_slru/test_slru--1.0.sql
+++ b/src/test/modules/test_slru/test_slru--1.0.sql
@@ -19,3 +19,8 @@ CREATE OR REPLACE FUNCTION test_slru_page_truncate(bigint) RETURNS VOID
   AS 'MODULE_PATHNAME', 'test_slru_page_truncate' LANGUAGE C;
 CREATE OR REPLACE FUNCTION test_slru_delete_all() RETURNS VOID
   AS 'MODULE_PATHNAME', 'test_slru_delete_all' LANGUAGE C;
+
+CREATE OR REPLACE FUNCTION test_create_multixact() RETURNS xid
+  AS 'MODULE_PATHNAME', 'test_create_multixact' LANGUAGE C;
+CREATE OR REPLACE FUNCTION test_read_multixact(xid) RETURNS VOID
+  AS 'MODULE_PATHNAME', 'test_read_multixact' LANGUAGE C;
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2026-02-15 18:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-13 20:31 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction" Sebastian Webber <[email protected]>
2026-02-14 11:42 ` Heikki Linnakangas <[email protected]>
2026-02-14 16:18   ` Andrey Borodin <[email protected]>
2026-02-14 17:41     ` Andrey Borodin <[email protected]>
2026-02-15 18:11       ` Andrey Borodin <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox