Altough the culprit is known, I've got more data as requested.

#0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at ./build/src/backend/port/pg_sema.c:327
#4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, mode=mode@entry=LW_EXCLUSIVE) at ./build/../src/backend/storage/lmgr/lwlock.c:1314
#5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
./build/../src/backend/access/transam/slru.c:1174
#6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, nmembers=2, members=members@entry=0x7f20de6831ec) at
./build/../src/backend/access/transam/multixact.c:944
#7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464
#8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
./build/../src/backend/access/transam/xlogrecovery.c:1951
#9  PerformWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:1782
#10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:5452
#11 0x000055a71797c7e4 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:282
#12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) at ./build/../src/backend/postmaster/auxprocess.c:141
#13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5381
#14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x55a73a8d0590) at ./build/../src/backend/postmaster/postmaster.c:1463
#15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at ./build/../src/backend/main/main.c:200

and WAL dump

rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel 1663/16384/16432 blk 536
rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn: 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089 (keysh)
336098 (keysh)
rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn: 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_MULTI, LOCK_ONLY,
KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn: 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk 22
rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn: 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits: [],
flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk 6
rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn: 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/16417
blk 741
rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn: 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk 26
rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn: 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC

Radim

On Thu, 21 May 2026 at 10:34, Radim Marek <radim@boringsql.com> wrote:
Thank you for the follow-up. In mean-time I can confirm the commit 77dff5d937b1 might be the source of the original reported issue.

Unfortunately pinning version down to 16.12 only avoids the MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery after 12+ hours.

FATAL: could not access status of transaction 24958976 DETAIL: Could not read from file "pg_multixact/offsets/017C" at offset 221184: read too few bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 (keysh)

We are going to try to pin 16.13 and try that before we can safely upgrade of the primary/are confident we have working PITR recovery available should we need it.

Radim

PS: Once I have some time I will try to setup a docker based harness to be able to replicate original problem for later testing of the fix.

On Thu, 21 May 2026 at 09:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 21 May 2026, at 00:12, Marko Tiikkaja <marko@joh.to> wrote:
>
> #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400

Thanks!

This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
If by chance you will have a backtrace of another deadlocking process -
please post it.

But it's not strictly necessary for analysis, I think we can figure out what
happened from the backtrace you already posted.


Best regards, Andrey Borodin.