16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Olegs Germanovs <[email protected]>
To: [email protected]
Subject: 16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact
Date: Wed, 27 May 2026 15:33:58 +0300
Message-ID: <CA+yEoBxD18Z3VxOwLfk+959giwrt=6Jo5HujnvyZZN2Y63TWBg@mail.gmail.com> (raw)

Hi!

*Bug summary:*  After upgrading from 16.13 to 16.14, archive recovery of a
basebackup
  hangs indefinitely during multixact WAL replay. The startup process
  blocks acquiring MultiXactOffsetSLRULock in EXCLUSIVE mode while
  already holding one LWLock. The lock has shared_count=1 with no
  exclusive holder, no other live process appears to hold it, and the
  same recovery completes successfully on 16.13.

*Environment*:
  PostgreSQL:  16.14 (pgdg)
  OS:          Ubuntu 22.04, kernel 6.8.0-1016-aws
  Arch:        aarch64 (AWS Graviton)
  Backup tool: pgBackRest 2.53.1 (backup) → 2.58.0 (restore)
  Source:      x86_64 cluster, Postgres version - 16.6 (Ubuntu
16.6-1.pgdg22.04+1)

*Scenario*: archive recovery from pgBackRest. End-of-backup record not yet
seen. Stalls during replay of WAL segment 0000000100006BEB00000031.
Verified that the next segment is genuinely irrelevant: startup is not
waiting for WAL — it has a record in hand (frozen on the same frame
across many gdb captures separated by minutes).

*Stack of startup process *(PID 395003):

  #7  LWLockAcquire (lock=0xfdbf33f2f000, mode=LW_EXCLUSIVE)
        at storage/lmgr/lwlock.c:1314
  #8  SimpleLruWriteAll (ctl=MultiXactOffsetCtlData, ...)
        at access/transam/slru.c:1174
  #9  RecordNewMultiXact (multi=981215231, offset=2282786137,
                          nmembers=2, members=...)
        at access/transam/multixact.c:944
  #10 multixact_redo (record=...)
        at access/transam/multixact.c:3464
  #11 ApplyWalRecord -> PerformWalRecovery -> StartupXLOG

LWLock state at 0xfdbf33f2f000 (stable across 5+ snapshots):
  tranche = 14 (MultiXactOffsetSLRU)
  state.value = 0x61000000
    = LW_FLAG_RELEASE_OK | LW_FLAG_HAS_WAITERS | shared_count=1
  waiters = {head=524, tail=524}   (one waiter)

Critical evidence — startup process holds exactly one LWLock:
  num_held_lwlocks = 1

*Combined with*:
  - No exclusive holder of the lock
  - shared_count = 1
  - Checkpointer (PID 395001) and bgwriter (PID 395002) sitting idle
    in CheckpointerMain/BackgroundWriterMain WaitLatch loops, with no
    visible work pending
  - Same gdb stack frame frozen across captures separated by minutes
  - Zero CPU, zero I/O, ctx_switches not advancing

→ The startup process is holding MultiXactOffsetSLRULock in SHARED mode
  (acquired earlier in the RecordNewMultiXact path) and now requesting
  it in EXCLUSIVE mode via SimpleLruWriteAll. Since LWLocks cannot be
  upgraded shared→exclusive, this is a self-deadlock.

Auxiliary process stacks (for completeness):

  Checkpointer (395001):
    epoll_pwait → WaitLatch (timeout=15000)
                → CheckpointerMain (checkpointer.c:535)
  Bgwriter (395002):
    epoll_pwait → WaitLatch (timeout=10000)
                → BackgroundWriterMain (bgwriter.c:336)

Both are idle in their main loops; held_lwlocks was <optimized out> in
gdb but neither process has any plausible reason to hold the SLRU lock.

pg_controldata excerpt:
  Database cluster state:           in archive recovery
  Backup start location:            6BEB/27000378
  Minimum recovery ending location: 6BEB/31DCEBE0
  Backup end location:              0/0
  End-of-backup record required:    yes
  NextMultiXactId:                  981215122 (replay reached 981215231)
  NextMultiOffset:                  2282785918 (replay reached 2282786137)
  oldestMultiXid:                   964544775

Reproduction:
  - Restore basebackup + WAL via pgBackRest archive-get on aarch64
  - Start cluster on 16.14: hangs as described, every time, same WAL
    position
  - Stop cluster, downgrade to 16.13 (same pgdg apt source), start:
    recovery completes successfully on identical PGDATA
  - No data or environment change between the two attempts

I'm happy to apply test patches or capture additional diagnostics.

Best wishes
Olegs Germanovs

view thread (2+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: 16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact
  In-Reply-To: <CA+yEoBxD18Z3VxOwLfk+959giwrt=6Jo5HujnvyZZN2Y63TWBg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox