BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

public inbox for [email protected]  
help / color / mirror / Atom feed

BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
9+ messages / 6 participants
[nested] [flat]

* BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-20 21:16  PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: PG Bug reporting form @ 2026-05-20 21:16 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following bug has been logged on the website:

Bug reference:      19490
Logged by:          Radim Marek
Email address:      [email protected]
PostgreSQL version: 16.14
Operating system:   Linux - Ubuntu 22.04
Description:        

Hello, 
due to a mistake we have run a higher minor version of 16.x against the
non-upgraded primary. This led to repeated issues on WAL processing.

Description:

A streaming replication standby running 16.14 stops advancing replay while
WAL keeps arriving from a 16.8 primary. The startup process is parked in
futex_wait_queue with wait_event = LWLock:MultiXactOffsetSLRU and no longer
makes progress.

pg_stat_slru shows zero MultiXact activity over the same window, so it
appears to stop on the lock itself rather than inside any SLRU read/write
path. Downgrading the standby binary to 16.12 (same data directory) resolved
the symptom under the same workload.

Configuration:

Primary running 16.8-1.pgdg22.04+1, we observed both loaded and "relatively"
idle (below 1000 QPS)
Replica: 16.14-1.pgdg22.04+1,  physical streaming, async, single replica on
16.14 due to misconfiguration, no cascading. Other replicas not affected
(running 16.8).

hot_standby_feedback enabled, logical replication from primary. default WAL
segment size. Default SLRU buffer sizes.

Observed symptoms on the standby

1. pg_stat_replication on primary, just the affected node

client_addr   state     sent_lag  write_lag  flush_lag  replay_lag_bytes
 replay_lag
10.x.x.x      streaming 0         0          0          8766784344      
 02:42:50

2. Receive/write/flush all at the primary's current LSN; only replay is far
behind and growing.

3. Startup process wait event on standby (sampled repeatedly, always
identical)pid    wait_event_type    wait_event             state
19095  LWLock             MultiXactOffsetSLRU    (null)

4. Kernel stack of the startup process
cat /proc/19095/stack
[<0>] futex_wait_queue+0x67/0xa0
[<0>] __futex_wait+0x155/0x1d0
[<0>] futex_wait+0x74/0x120
[<0>] do_futex+0x16d/0x230
[<0>] __x64_sys_futex+0x95/0x200
[<0>] x64_sys_call+0x117b/0x2480
[<0>] do_syscall_64+0x81/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80
cat /proc/19095/wchan
futex_wait_queue

5. pg_stat_slru on the standby, after pg_stat_reset_slru(NULL) and a
60-second wait under live WAL streaming
name             blks_zeroed  blks_hit  blks_read  blks_written
MultiXactMember  0            0         0          0
MultiXactOffset  0            0         0          0

6. There was no MultiXact SLRU activity while the startup process is
reportedly waiting on the MultiXact offset SLRU lock.

7. Replay LSN frozen, receive LSN advancing. Sampled 60 sec apart.
recv             replay          lag_bytes
1476A/D1DA158    14767/EE01DB78  9111848416
1476A/EB565D0    14767/EE01DB78  9138571864

8. No replay progress; ~9 GB of WAL buffered locally that is never applied.

6. Other backends on the standby: only a diagnostic psql client. No
hot-standby readers. 

7. MultiXact age on the primary is small (~360k on most DBs, ~239k on the
main DB). No MultiXact storm.

Workarounds

- Restarting the standby cleared the block but once it caught up it repeated
again- Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against
the same data directory restored normal replay. After 60s under the same
workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact.

I understand that running 6 minor versions behind is not particulary good
setup, but given this being supported direction this might be worth at least
in 16.13/16.14 release notes.

---

Hope this helps,
Radim

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:07  Andrey Borodin <[email protected]>
  parent: PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Andrey Borodin @ 2026-05-21 07:07 UTC (permalink / raw)
  To: [email protected]; PostgreSQL mailing lists <[email protected]>

Thanks for the report!

Oh, this seems to be from the "gift that keeps on giving" department.
Related to [0]

> On 20 May 2026, at 14:16, PG Bug reporting form <[email protected]> wrote:
> 
> Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against
> the same data directory restored normal replay. After 60s under the same
> workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact.

Are you sure that it's not 16.11 that is resolving the problem?
Can you get a backtrace of hanging startup process with debug symbols? Or obtain last
replayed LSN and do a WAL dump in the area of deadlocked startup.

I don't see how this might be a result of [1] and [2], so, perhaps, it's some more peculiarities
from [3]. But 16.12 has [3]...


Best regards, Andrey Borodin.


[0] https://www.postgresql.org/message-id/flat/CACV2tSw3VYS7d27ftO_cs%2BaF3M54%2BJwWBbqSGLcKoG9cvyb6EA%4...
[1] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=77dff5d937b192b85c55bc...
[2] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=23064542f8bdcbc4b6a513...
[3] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=6351669130782ed01eed3a...




^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:12  Marko Tiikkaja <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Marko Tiikkaja @ 2026-05-21 07:12 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]>

Hi Andrey,

On Thu, May 21, 2026 at 10:07 AM Andrey Borodin <[email protected]> wrote:
> Are you sure that it's not 16.11 that is resolving the problem?
> Can you get a backtrace of hanging startup process with debug symbols?

We had this problem just morning:

#0  __futex_abstimed_wait_common64 (private=<optimized out>,
cancel=true, abstime=0x0, op=265, expected=0,
futex_word=0x785c290170b8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized
out>, abstime=0x0, clockid=0, expected=0, futex_word=0x785c290170b8)
at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64
(futex_word=futex_word@entry=0x785c290170b8,
expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0,
    private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000786048c9cbdf in do_futex_wait (sem=sem@entry=0x785c290170b8,
abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000786048c9cc78 in __new_sem_wait_slow64
(sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at
./nptl/sem_waitcommon.c:183
#5  0x0000786048c9ccf1 in __new_sem_wait
(sem=sem@entry=0x785c290170b8) at ./nptl/sem_wait.c:42
#6  0x0000654c8b150b86 in PGSemaphoreLock (sema=0x785c290170b8) at
port/pg_sema.c:327
#7  LWLockAcquire (lock=0x785c29017a80, mode=LW_EXCLUSIVE) at
storage/lmgr/./build/../src/backend/storage/lmgr/lwlock.c:1314
#8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
<MultiXactOffsetCtlData.lto_priv.0>, allow_redirtied=<optimized out>)
    at access/transam/./build/../src/backend/access/transam/slru.c:1174
#9  0x0000654c8ae22719 in RecordNewMultiXact (multi=1201227775,
offset=2755202388, nmembers=2, members=0x7860465ec28c)
    at access/transam/./build/../src/backend/access/transam/multixact.c:944
#10 0x0000654c8ae255c6 in multixact_redo (record=0x654cb292c620) at
access/transam/./build/../src/backend/access/transam/multixact.c:3464
#11 0x0000654c8ae4ea2d in ApplyWalRecord (replayTLI=<synthetic
pointer>, record=0x7860465ec250, xlogreader=<optimized out>)
    at access/transam/./build/../src/include/access/xlog_internal.h:379
#12 PerformWalRecovery () at
access/transam/./build/../src/backend/access/transam/xlogrecovery.c:1782
#13 0x0000654c8ae3bcb7 in StartupXLOG () at
access/transam/./build/../src/backend/access/transam/xlog.c:5452
#14 0x0000654c8b0cbe7b in StartupProcessMain () at
postmaster/./build/../src/backend/postmaster/startup.c:282

We downgraded to 16.13 and the problem went away.


.m





^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:25  Andrey Borodin <[email protected]>
  parent: Marko Tiikkaja <[email protected]>
  0 siblings, 2 replies; 9+ messages in thread

From: Andrey Borodin @ 2026-05-21 07:25 UTC (permalink / raw)
  To: Marko Tiikkaja <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]>

> On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> 
> #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400

Thanks!

This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
If by chance you will have a backtrace of another deadlocking process -
please post it.

But it's not strictly necessary for analysis, I think we can figure out what
happened from the backtrace you already posted.

Best regards, Andrey Borodin.

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:45  Ayush Tiwari <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 0 replies; 9+ messages in thread

From: Ayush Tiwari @ 2026-05-21 07:45 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; [email protected]; PostgreSQL mailing lists <[email protected]>

Hi,

On Thu, 21 May 2026 at 12:55, Andrey Borodin <[email protected]> wrote:

>
>
> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> >
> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>
> Thanks!
>
> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
> If by chance you will have a backtrace of another deadlocking process -
> please post it.
>
> But it's not strictly necessary for analysis, I think we can figure out
> what
> happened from the backtrace you already posted.
>

I had a look at the code that Marko's backtrace pointed at and I
believe this is a straightforward self-deadlock introduced by
77dff5d937b.

In RecordNewMultiXact() on REL_16_STABLE:

  LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);

  ...

  if (InRecovery && next_pageno != pageno)
  {
      ...
      if (last_initialized_offsets_page == -1)
      {
          SimpleLruWriteAll(MultiXactOffsetCtl, false);  /* <-- here */
          init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl,
next_pageno);
      }
      else
          init_needed = (last_initialized_offsets_page == pageno);
      ...
  }

The outer LWLockAcquire takes MultiXactOffsetSLRULock EXCLUSIVE.
SimpleLruWriteAll() in REL_16_STABLE then does

  LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

and for the MultiXactOffsetCtl SLRU, shared->ControlLock is
MultiXactOffsetSLRULock (set up by SimpleLruInit(...
MultiXactOffsetSLRULock ...)).
So it tries to take the very lock the same backend already holds.
LWLockAcquire does not detect that and parks the process on
LWLock:MultiXactOffsetSLRU forever.

That matches every datum in the report:

  -  wait_event = LWLock:MultiXactOffsetSLRU.
   - pg_stat_slru shows zero MultiXact activity, because the
    SimpleLruWriteAll loop never gets past LWLockAcquire to actually
    write a page.
  - Restart unwedges things briefly.
  - The deadlock only triggers when last_initialized_offsets_page is
    still -1, i.e. before any XLOG_MULTIXACT_ZERO_OFF_PAGE record has
    been replayed in this recovery session, which is at most once per
    startup and consistent with the "recurs after catch-up" behaviour.

The "safety flush" the comment justifies is it needed?
Every offsets page that this code path initializes is synchronously written
via SimpleLruWritePage() a few lines below the SimpleLruZeroPage(),
with an Assert that the page is clean afterwards.  So at the moment
we call SimpleLruDoesPhysicalPageExist(), there shouldn't be a relevant
dirty offsets page in the SLRU buffer cache that would lead to a
false negative.  Dropping the SimpleLruWriteAll() call therefore
removes the self-deadlock without changing correctness.

Maybe I'm missing something here. Thoughts?

Regards,
Ayush

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 08:34  Radim Marek <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 1 reply; 9+ messages in thread

From: Radim Marek @ 2026-05-21 08:34 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Thank you for the follow-up. In mean-time I can confirm the
commit 77dff5d937b1 might be the source of the original reported issue.

Unfortunately pinning version down to 16.12 only avoids the
MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
after 12+ hours.

FATAL: could not access status of transaction 24958976 DETAIL: Could not
read from file "pg_multixact/offsets/017C" at offset 221184: read too few
bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
(keysh)

We are going to try to pin 16.13 and try that before we can safely upgrade
of the primary/are confident we have working PITR recovery available should
we need it.

Radim

PS: Once I have some time I will try to setup a docker based harness to be
able to replicate original problem for later testing of the fix.

On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote:

>
>
> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> >
> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>
> Thanks!
>
> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
> If by chance you will have a backtrace of another deadlocking process -
> please post it.
>
> But it's not strictly necessary for analysis, I think we can figure out
> what
> happened from the backtrace you already posted.
>
>
> Best regards, Andrey Borodin.
>

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 09:06  Radim Marek <[email protected]>
  parent: Radim Marek <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Radim Marek @ 2026-05-21 09:06 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Altough the culprit is known, I've got more data as requested.

#0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
./build/src/backend/port/pg_sema.c:327
#4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
mode=mode@entry=LW_EXCLUSIVE) at
./build/../src/backend/storage/lmgr/lwlock.c:1314
#5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
<MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
./build/../src/backend/access/transam/slru.c:1174
#6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
nmembers=2, members=members@entry=0x7f20de6831ec) at
./build/../src/backend/access/transam/multixact.c:944
#7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
./build/../src/backend/access/transam/multixact.c:3464
#8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
./build/../src/backend/access/transam/xlogrecovery.c:1951
#9  PerformWalRecovery () at
./build/../src/backend/access/transam/xlogrecovery.c:1782
#10 0x000055a717740def in StartupXLOG () at
./build/../src/backend/access/transam/xlog.c:5452
#11 0x000055a71797c7e4 in StartupProcessMain () at
./build/../src/backend/postmaster/startup.c:282
#12 0x000055a717972b20 in AuxiliaryProcessMain
(auxtype=auxtype@entry=StartupProcess)
at ./build/../src/backend/postmaster/auxprocess.c:141
#13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
./build/../src/backend/postmaster/postmaster.c:5381
#14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x55a73a8d0590)
at ./build/../src/backend/postmaster/postmaster.c:1463
#15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
./build/../src/backend/main/main.c:200

and WAL dump

rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
1663/16384/16432 blk 536
rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
2: 336089 (keysh)
336098 (keysh)
rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
[IS_MULTI, LOCK_ONLY,
KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
52, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk
22
rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
149, old_infobits: [],
flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk
6
rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
rel 1663/16384/16417
blk 741
rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
243, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk
26
rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC

Radim

On Thu, 21 May 2026 at 10:34, Radim Marek <[email protected]> wrote:

> Thank you for the follow-up. In mean-time I can confirm the
> commit 77dff5d937b1 might be the source of the original reported issue.
>
> Unfortunately pinning version down to 16.12 only avoids the
> MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
> after 12+ hours.
>
> FATAL: could not access status of transaction 24958976 DETAIL: Could not
> read from file "pg_multixact/offsets/017C" at offset 221184: read too few
> bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
> 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
> (keysh)
>
> We are going to try to pin 16.13 and try that before we can safely upgrade
> of the primary/are confident we have working PITR recovery available should
> we need it.
>
> Radim
>
> PS: Once I have some time I will try to setup a docker based harness to be
> able to replicate original problem for later testing of the fix.
>
> On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote:
>
>>
>>
>> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
>> >
>> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>>
>> Thanks!
>>
>> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
>> If by chance you will have a backtrace of another deadlocking process -
>> please post it.
>>
>> But it's not strictly necessary for analysis, I think we can figure out
>> what
>> happened from the backtrace you already posted.
>>
>>
>> Best regards, Andrey Borodin.
>>
>


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-22 16:51  Ayush Tiwari <[email protected]>
  parent: Radim Marek <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: Ayush Tiwari @ 2026-05-22 16:51 UTC (permalink / raw)
  To: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Hi,

On Thu, 21 May 2026 at 14:36, Radim Marek <[email protected]> wrote:

> Altough the culprit is known, I've got more data as requested.
>
> #0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
> ./build/src/backend/port/pg_sema.c:327
> #4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
> mode=mode@entry=LW_EXCLUSIVE) at
> ./build/../src/backend/storage/lmgr/lwlock.c:1314
> #5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
> <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
> ./build/../src/backend/access/transam/slru.c:1174
> #6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
> nmembers=2, members=members@entry=0x7f20de6831ec) at
> ./build/../src/backend/access/transam/multixact.c:944
> #7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
> ./build/../src/backend/access/transam/multixact.c:3464
> #8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
> record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
> ./build/../src/backend/access/transam/xlogrecovery.c:1951
> #9  PerformWalRecovery () at
> ./build/../src/backend/access/transam/xlogrecovery.c:1782
> #10 0x000055a717740def in StartupXLOG () at
> ./build/../src/backend/access/transam/xlog.c:5452
> #11 0x000055a71797c7e4 in StartupProcessMain () at
> ./build/../src/backend/postmaster/startup.c:282
> #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess)
> at ./build/../src/backend/postmaster/auxprocess.c:141
> #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
> ./build/../src/backend/postmaster/postmaster.c:5381
> #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
> argv=argv@entry=0x55a73a8d0590) at
> ./build/../src/backend/postmaster/postmaster.c:1463
> #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
> ./build/../src/backend/main/main.c:200
>
> and WAL dump
>
> rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
> 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
> 1663/16384/16432 blk 536
> rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
> 2: 336089 (keysh)
> 336098 (keysh)
> rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
> [IS_MULTI, LOCK_ONLY,
> KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
> 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 52, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401
> blk 22
> rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
> 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 149, old_infobits: [],
> flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399
> blk 6
> rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
> 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
> rel 1663/16384/16417
> blk 741
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
> 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
> 243, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401
> blk 26
> rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
> 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC
>
> Radim
>

Thanks for the additional backtrace and WAL dump.  That makes the failure
mode much clearer.

The latest trace shows the startup process here:

  SimpleLruWriteAll(MultiXactOffsetCtl, false)
  RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...)
  multixact_redo()

The WAL dump also shows the matching record:

  rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2

79871 is the last multixact on its offsets page, so replaying that record
enters the next_pageno != pageno compatibility path added by 77dff5d937b.

On REL_14 through REL_16, RecordNewMultiXact() already holds
MultiXactOffsetSLRULock while executing that code.  SimpleLruWriteAll() then
tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same
MultiXactOffsetSLRULock on those branches.  That explains the standby
startup
process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding
SLRU I/O activity.

I think the right fix is to remove that SimpleLruWriteAll() call while
keeping the missing-page initialization logic.  The flush is only meant to
make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
but have not reached disk.  In this fallback path, I don't see a way for
the tested next_pageno to be in that state: if RecordNewMultiXact() itself
initializes the page, it writes it synchronously with SimpleLruWritePage()
before setting last_initialized_offsets_page.

I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
is also present on PG 14 and 15.  PG 17 and
18 have the same compatibility call, but SLRU locking is banked
there, and RecordNewMultiXact() does not appear to hold the relevant bank
lock before calling SimpleLruWriteAll(), so I would not describe those
branches as having this exact self-deadlock, but needs more analysis.

Added both Andrey and Heikki in to-mail, since I'm not sure if this
is more extreme than the multixact offset issue we had with 16.12, or it
is at par with that.

Regards,
Ayush


Attachments:

  [application/octet-stream] v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch (2.5K, 3-v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch)
  download | inline diff:
From b33abeede0847edac3603b87a478a832be1784f8 Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <[email protected]>
Date: Thu, 21 May 2026 07:39:28 +0000
Subject: [PATCH REL_16_STABLE v1] Avoid self-deadlock on
 MultiXactOffsetSLRULock during WAL replay

Commit 77dff5d937b added a compatibility check in RecordNewMultiXact()
that can call SimpleLruWriteAll(MultiXactOffsetCtl, false) while already
holding MultiXactOffsetSLRULock.  In REL_16, SimpleLruWriteAll() tries
to acquire the same SLRU control lock, so WAL replay can self-deadlock
with the startup process waiting on LWLock:MultiXactOffsetSLRU.

The flush is not needed for the page tested in this fallback path.  If
RecordNewMultiXact() initializes that offsets page, it writes it
synchronously with SimpleLruWritePage() before updating
last_initialized_offsets_page.  Drop the unsafe flush and keep the
existing missing-page initialization logic.

Reported-by: Radim Marek <[email protected]>
Reported-by: Marko Tiikkaja <[email protected]>
Diagnosed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
 src/backend/access/transam/multixact.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f825579e888..5b6b48eb79c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -934,16 +934,17 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		 * seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, which should
 		 * happen at most once after starting WAL recovery.
 		 *
-		 * As an extra safety measure, if we do resort to
-		 * SimpleLruDoesPhysicalPageExist(), flush the SLRU buffers first so
-		 * that it will return an accurate result.
+		 *
+		 * We cannot call SimpleLruWriteAll() to flush the SLRU buffers
+		 * here, because that would self-deadlock on MultiXactOffsetSLRULock,
+		 * which we already hold.  Fortunately we do not need to: every
+		 * page that this code path initializes is synchronously flushed via
+		 * SimpleLruWritePage() below before this lock is released, so there
+		 * are no relevant dirty pages.
 		 *----------
 		 */
 		if (last_initialized_offsets_page == -1)
-		{
-			SimpleLruWriteAll(MultiXactOffsetCtl, false);
 			init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno);
-		}
 		else
 			init_needed = (last_initialized_offsets_page == pageno);
 
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 08:02  Michael Paquier <[email protected]>
  parent: Ayush Tiwari <[email protected]>
  0 siblings, 0 replies; 9+ messages in thread

From: Michael Paquier @ 2026-05-26 08:02 UTC (permalink / raw)
  To: Ayush Tiwari <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote:
> I think the right fix is to remove that SimpleLruWriteAll() call while
> keeping the missing-page initialization logic.  The flush is only meant to
> make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
> but have not reached disk.  In this fallback path, I don't see a way for
> the tested next_pageno to be in that state: if RecordNewMultiXact() itself
> initializes the page, it writes it synchronously with SimpleLruWritePage()
> before setting last_initialized_offsets_page.

FWIW, I'm having a couple of customers complaining about that as well,
as cross-version physical replication is a thing for minor upgrade
flows.  This bug is making suddenly recovery disruptive for some folks
out there.  :(

> I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
> is also present on PG 14 and 15.  PG 17 and
> 18 have the same compatibility call, but SLRU locking is banked
> there, and RecordNewMultiXact() does not appear to hold the relevant bank
> lock before calling SimpleLruWriteAll(), so I would not describe those
> branches as having this exact self-deadlock, but needs more analysis.

So your root argument is that while the SimpleLruWriteAll() is
defensive, it is not actually necessary because it means that
last_initialized_offsets_page is -1 we have not yet replayed
ZERO_OFF_PAGE and that we have no dirty page that could make
SimpleLruDoesPhysicalPageExis() return an incorrect result, which
would be bad.  I am not sure to agree that this assumption is correct
all the time, see for example the WAL message mentioned in the thread
that has led to 77dff5d937b1:
https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi

I can see mentioned this WAL sequence, which is possible because there
is no strict ordering in the creation of the mxacts:
ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047

Based on that, if we begin recovery after ZERO_PAGE:2048, we could
finish with this kind of sequence:
CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047

Looking closer, last_initialized_offsets_page stays at -1.  The page
for 2048 was zeroed before the checkpoint by the earlier
ZERO_PAGE:2048.  CREATE_ID:2048 and CREATE_ID:2049 are created first.
Then comes CREATE_ID:2047 which enters the
last_initialized_offsets_page branch.  If we don't have the WriteAll(),
the page where the offsets of 2048 and 2049 are located gets zeroed
while creating 2047, corrupting the existing state of 2048 and 2049.

A different approach would be to release and re-acquire the
MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think
that it should be actually safe.  Even if read-only backends evict
dirty pages between the moment the lock is released and the moment it
is re-acquired in SimpleLruWriteAll(), the pages would be would be
written to disk due to the eviction, which is what we want for
correctness.  And only the startup process dirties offset pages during
recovery, AFAIK.  Thoughts?

> Added both Andrey and Heikki in to-mail, since I'm not sure if this
> is more extreme than the multixact offset issue we had with 16.12, or it
> is at par with that.

Indeed, let's wait for at least Heikki's input.  

Anyway, for any fixes, I don't think that it would be a good idea to
skip v17 and v18, relying on the SLRU bank locks to not conflict to
bypass the WriteAll() conflict.  Let's keep all the branches across
v14~v18 in sync.
--
Michael

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

end of thread, other threads:[~2026-05-26 08:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-20 21:16 BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 PG Bug reporting form <[email protected]>
2026-05-21 07:07 ` Andrey Borodin <[email protected]>
2026-05-21 07:12   ` Marko Tiikkaja <[email protected]>
2026-05-21 07:25     ` Andrey Borodin <[email protected]>
2026-05-21 07:45       ` Ayush Tiwari <[email protected]>
2026-05-21 08:34       ` Radim Marek <[email protected]>
2026-05-21 09:06         ` Radim Marek <[email protected]>
2026-05-22 16:51           ` Ayush Tiwari <[email protected]>
2026-05-26 08:02             ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox