Hi,

On Thu, 21 May 2026 at 14:36, Radim Marek <radim@boringsql.com> wrote:
Altough the culprit is known, I've got more data as requested.

#0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at ./build/src/backend/port/pg_sema.c:327
#4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, mode=mode@entry=LW_EXCLUSIVE) at ./build/../src/backend/storage/lmgr/lwlock.c:1314
#5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
./build/../src/backend/access/transam/slru.c:1174
#6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, nmembers=2, members=members@entry=0x7f20de6831ec) at
./build/../src/backend/access/transam/multixact.c:944
#7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464
#8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
./build/../src/backend/access/transam/xlogrecovery.c:1951
#9  PerformWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:1782
#10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:5452
#11 0x000055a71797c7e4 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:282
#12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) at ./build/../src/backend/postmaster/auxprocess.c:141
#13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5381
#14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x55a73a8d0590) at ./build/../src/backend/postmaster/postmaster.c:1463
#15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at ./build/../src/backend/main/main.c:200

and WAL dump

rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel 1663/16384/16432 blk 536
rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn: 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089 (keysh)
336098 (keysh)
rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn: 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_MULTI, LOCK_ONLY,
KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn: 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk 22
rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn: 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits: [],
flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk 6
rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn: 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/16417
blk 741
rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn: 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk 26
rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn: 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC

Radim

Thanks for the additional backtrace and WAL dump.  That makes the failure
mode much clearer.

The latest trace shows the startup process here:

  SimpleLruWriteAll(MultiXactOffsetCtl, false)
  RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...)
  multixact_redo()

The WAL dump also shows the matching record:

  rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2

79871 is the last multixact on its offsets page, so replaying that record
enters the next_pageno != pageno compatibility path added by 77dff5d937b. 

On REL_14 through REL_16, RecordNewMultiXact() already holds
MultiXactOffsetSLRULock while executing that code.  SimpleLruWriteAll() then
tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same
MultiXactOffsetSLRULock on those branches.  That explains the standby startup
process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding
SLRU I/O activity. 

I think the right fix is to remove that SimpleLruWriteAll() call while
keeping the missing-page initialization logic.  The flush is only meant to
make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
but have not reached disk.  In this fallback path, I don't see a way for
the tested next_pageno to be in that state: if RecordNewMultiXact() itself
initializes the page, it writes it synchronously with SimpleLruWritePage()
before setting last_initialized_offsets_page.

I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
is also present on PG 14 and 15.  PG 17 and
18 have the same compatibility call, but SLRU locking is banked
there, and RecordNewMultiXact() does not appear to hold the relevant bank
lock before calling SimpleLruWriteAll(), so I would not describe those
branches as having this exact self-deadlock, but needs more analysis.

Added both Andrey and Heikki in to-mail, since I'm not sure if this
is more extreme than the multixact offset issue we had with 16.12, or it
is at par with that.

Regards,
Ayush