public inbox for [email protected]  
help / color / mirror / Atom feed
16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact
2+ messages / 2 participants
[nested] [flat]

* 16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact
@ 2026-05-27 12:33  Olegs Germanovs <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Olegs Germanovs @ 2026-05-27 12:33 UTC (permalink / raw)
  To: [email protected]

Hi!


*Bug summary:*  After upgrading from 16.13 to 16.14, archive recovery of a
basebackup
  hangs indefinitely during multixact WAL replay. The startup process
  blocks acquiring MultiXactOffsetSLRULock in EXCLUSIVE mode while
  already holding one LWLock. The lock has shared_count=1 with no
  exclusive holder, no other live process appears to hold it, and the
  same recovery completes successfully on 16.13.

*Environment*:
  PostgreSQL:  16.14 (pgdg)
  OS:          Ubuntu 22.04, kernel 6.8.0-1016-aws
  Arch:        aarch64 (AWS Graviton)
  Backup tool: pgBackRest 2.53.1 (backup) → 2.58.0 (restore)
  Source:      x86_64 cluster, Postgres version - 16.6 (Ubuntu
16.6-1.pgdg22.04+1)

*Scenario*: archive recovery from pgBackRest. End-of-backup record not yet
seen. Stalls during replay of WAL segment 0000000100006BEB00000031.
Verified that the next segment is genuinely irrelevant: startup is not
waiting for WAL — it has a record in hand (frozen on the same frame
across many gdb captures separated by minutes).

*Stack of startup process *(PID 395003):

  #7  LWLockAcquire (lock=0xfdbf33f2f000, mode=LW_EXCLUSIVE)
        at storage/lmgr/lwlock.c:1314
  #8  SimpleLruWriteAll (ctl=MultiXactOffsetCtlData, ...)
        at access/transam/slru.c:1174
  #9  RecordNewMultiXact (multi=981215231, offset=2282786137,
                          nmembers=2, members=...)
        at access/transam/multixact.c:944
  #10 multixact_redo (record=...)
        at access/transam/multixact.c:3464
  #11 ApplyWalRecord -> PerformWalRecovery -> StartupXLOG

LWLock state at 0xfdbf33f2f000 (stable across 5+ snapshots):
  tranche = 14 (MultiXactOffsetSLRU)
  state.value = 0x61000000
    = LW_FLAG_RELEASE_OK | LW_FLAG_HAS_WAITERS | shared_count=1
  waiters = {head=524, tail=524}   (one waiter)

Critical evidence — startup process holds exactly one LWLock:
  num_held_lwlocks = 1

*Combined with*:
  - No exclusive holder of the lock
  - shared_count = 1
  - Checkpointer (PID 395001) and bgwriter (PID 395002) sitting idle
    in CheckpointerMain/BackgroundWriterMain WaitLatch loops, with no
    visible work pending
  - Same gdb stack frame frozen across captures separated by minutes
  - Zero CPU, zero I/O, ctx_switches not advancing

→ The startup process is holding MultiXactOffsetSLRULock in SHARED mode
  (acquired earlier in the RecordNewMultiXact path) and now requesting
  it in EXCLUSIVE mode via SimpleLruWriteAll. Since LWLocks cannot be
  upgraded shared→exclusive, this is a self-deadlock.

Auxiliary process stacks (for completeness):

  Checkpointer (395001):
    epoll_pwait → WaitLatch (timeout=15000)
                → CheckpointerMain (checkpointer.c:535)
  Bgwriter (395002):
    epoll_pwait → WaitLatch (timeout=10000)
                → BackgroundWriterMain (bgwriter.c:336)

Both are idle in their main loops; held_lwlocks was <optimized out> in
gdb but neither process has any plausible reason to hold the SLRU lock.

pg_controldata excerpt:
  Database cluster state:           in archive recovery
  Backup start location:            6BEB/27000378
  Minimum recovery ending location: 6BEB/31DCEBE0
  Backup end location:              0/0
  End-of-backup record required:    yes
  NextMultiXactId:                  981215122 (replay reached 981215231)
  NextMultiOffset:                  2282785918 (replay reached 2282786137)
  oldestMultiXid:                   964544775

Reproduction:
  - Restore basebackup + WAL via pgBackRest archive-get on aarch64
  - Start cluster on 16.14: hangs as described, every time, same WAL
    position
  - Stop cluster, downgrade to 16.13 (same pgdg apt source), start:
    recovery completes successfully on identical PGDATA
  - No data or environment change between the two attempts

I'm happy to apply test patches or capture additional diagnostics.

Best wishes
Olegs Germanovs


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: 16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact
@ 2026-05-27 12:48  Andrey Borodin <[email protected]>
  parent: Olegs Germanovs <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Andrey Borodin @ 2026-05-27 12:48 UTC (permalink / raw)
  To: Olegs Germanovs <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>



> On 27 May 2026, at 17:33, Olegs Germanovs <[email protected]> wrote:
> 
> After upgrading from 16.13 to 16.14, archive recovery of a basebackup
>   hangs indefinitely during multixact WAL replay.

Hi Olegs!

Thanks for the detailed report! Your analysis of the self-deadlock is spot on.

The fix for this problem has already been committed to REL_16_STABLE as 42a3194e5483 [0].
It was discussed on the pgsql-bugs thread "BUG #19490: Streaming standby on 16.14 stops
applying WAL on MultiXactOffsetSLRU when primary is 16.8" [1].

Please let us know if you still observe the problem or any other unusual behavior.


Best regards, Andrey Borodin.



[0] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=42a3194e548349b658a808...
[1] https://www.postgresql.org/message-id/flat/46FE61C9-F273-45FD-BED7-0F8CDA6EB992%40yandex-team.ru#69d...





^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2026-05-27 12:48 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-27 12:33 16.14 regression: startup process self-deadlocks during multixact WAL replay in RecordNewMultiXact Olegs Germanovs <[email protected]>
2026-05-27 12:48 ` Andrey Borodin <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox