Hi,

On Thu, 21 May 2026 at 12:55, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 21 May 2026, at 00:12, Marko Tiikkaja <marko@joh.to> wrote:
>
> #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400

Thanks!

This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
If by chance you will have a backtrace of another deadlocking process -
please post it.

But it's not strictly necessary for analysis, I think we can figure out what
happened from the backtrace you already posted.

I had a look at the code that Marko's backtrace pointed at and I
believe this is a straightforward self-deadlock introduced by
77dff5d937b.

In RecordNewMultiXact() on REL_16_STABLE:

  LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);

  ...

  if (InRecovery && next_pageno != pageno)
  {
      ...
      if (last_initialized_offsets_page == -1)
      {
          SimpleLruWriteAll(MultiXactOffsetCtl, false);  /* <-- here */
          init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno);
      }
      else
          init_needed = (last_initialized_offsets_page == pageno);
      ...
  }

The outer LWLockAcquire takes MultiXactOffsetSLRULock EXCLUSIVE.
SimpleLruWriteAll() in REL_16_STABLE then does

  LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

and for the MultiXactOffsetCtl SLRU, shared->ControlLock is
MultiXactOffsetSLRULock (set up by SimpleLruInit(... MultiXactOffsetSLRULock ...)).
So it tries to take the very lock the same backend already holds.
LWLockAcquire does not detect that and parks the process on
LWLock:MultiXactOffsetSLRU forever.

That matches every datum in the report:

  -  wait_event = LWLock:MultiXactOffsetSLRU.
   - pg_stat_slru shows zero MultiXact activity, because the
    SimpleLruWriteAll loop never gets past LWLockAcquire to actually
    write a page.
  - Restart unwedges things briefly.
  - The deadlock only triggers when last_initialized_offsets_page is
    still -1, i.e. before any XLOG_MULTIXACT_ZERO_OFF_PAGE record has
    been replayed in this recovery session, which is at most once per
    startup and consistent with the "recurs after catch-up" behaviour.

The "safety flush" the comment justifies is it needed?
Every offsets page that this code path initializes is synchronously written
via SimpleLruWritePage() a few lines below the SimpleLruZeroPage(),
with an Assert that the page is clean afterwards.  So at the moment
we call SimpleLruDoesPhysicalPageExist(), there shouldn't be a relevant
dirty offsets page in the SLRU buffer cache that would lead to a
false negative.  Dropping the SimpleLruWriteAll() call therefore
removes the self-deadlock without changing correctness.

Maybe I'm missing something here. Thoughts?

Regards,
Ayush