BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

public inbox for [email protected]  
help / color / mirror / Atom feed

BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
20+ messages / 8 participants
[nested] [flat]

* BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-20 21:16  PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: PG Bug reporting form @ 2026-05-20 21:16 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following bug has been logged on the website:

Bug reference:      19490
Logged by:          Radim Marek
Email address:      [email protected]
PostgreSQL version: 16.14
Operating system:   Linux - Ubuntu 22.04
Description:        

Hello, 
due to a mistake we have run a higher minor version of 16.x against the
non-upgraded primary. This led to repeated issues on WAL processing.

Description:

A streaming replication standby running 16.14 stops advancing replay while
WAL keeps arriving from a 16.8 primary. The startup process is parked in
futex_wait_queue with wait_event = LWLock:MultiXactOffsetSLRU and no longer
makes progress.

pg_stat_slru shows zero MultiXact activity over the same window, so it
appears to stop on the lock itself rather than inside any SLRU read/write
path. Downgrading the standby binary to 16.12 (same data directory) resolved
the symptom under the same workload.

Configuration:

Primary running 16.8-1.pgdg22.04+1, we observed both loaded and "relatively"
idle (below 1000 QPS)
Replica: 16.14-1.pgdg22.04+1,  physical streaming, async, single replica on
16.14 due to misconfiguration, no cascading. Other replicas not affected
(running 16.8).

hot_standby_feedback enabled, logical replication from primary. default WAL
segment size. Default SLRU buffer sizes.

Observed symptoms on the standby

1. pg_stat_replication on primary, just the affected node

client_addr   state     sent_lag  write_lag  flush_lag  replay_lag_bytes
 replay_lag
10.x.x.x      streaming 0         0          0          8766784344      
 02:42:50

2. Receive/write/flush all at the primary's current LSN; only replay is far
behind and growing.

3. Startup process wait event on standby (sampled repeatedly, always
identical)pid    wait_event_type    wait_event             state
19095  LWLock             MultiXactOffsetSLRU    (null)

4. Kernel stack of the startup process
cat /proc/19095/stack
[<0>] futex_wait_queue+0x67/0xa0
[<0>] __futex_wait+0x155/0x1d0
[<0>] futex_wait+0x74/0x120
[<0>] do_futex+0x16d/0x230
[<0>] __x64_sys_futex+0x95/0x200
[<0>] x64_sys_call+0x117b/0x2480
[<0>] do_syscall_64+0x81/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80
cat /proc/19095/wchan
futex_wait_queue

5. pg_stat_slru on the standby, after pg_stat_reset_slru(NULL) and a
60-second wait under live WAL streaming
name             blks_zeroed  blks_hit  blks_read  blks_written
MultiXactMember  0            0         0          0
MultiXactOffset  0            0         0          0

6. There was no MultiXact SLRU activity while the startup process is
reportedly waiting on the MultiXact offset SLRU lock.

7. Replay LSN frozen, receive LSN advancing. Sampled 60 sec apart.
recv             replay          lag_bytes
1476A/D1DA158    14767/EE01DB78  9111848416
1476A/EB565D0    14767/EE01DB78  9138571864

8. No replay progress; ~9 GB of WAL buffered locally that is never applied.

6. Other backends on the standby: only a diagnostic psql client. No
hot-standby readers. 

7. MultiXact age on the primary is small (~360k on most DBs, ~239k on the
main DB). No MultiXact storm.

Workarounds

- Restarting the standby cleared the block but once it caught up it repeated
again- Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against
the same data directory restored normal replay. After 60s under the same
workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact.

I understand that running 6 minor versions behind is not particulary good
setup, but given this being supported direction this might be worth at least
in 16.13/16.14 release notes.

---

Hope this helps,
Radim

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:07  Andrey Borodin <[email protected]>
  parent: PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Andrey Borodin @ 2026-05-21 07:07 UTC (permalink / raw)
  To: [email protected]; PostgreSQL mailing lists <[email protected]>

Thanks for the report!

Oh, this seems to be from the "gift that keeps on giving" department.
Related to [0]

> On 20 May 2026, at 14:16, PG Bug reporting form <[email protected]> wrote:
> 
> Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against
> the same data directory restored normal replay. After 60s under the same
> workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact.

Are you sure that it's not 16.11 that is resolving the problem?
Can you get a backtrace of hanging startup process with debug symbols? Or obtain last
replayed LSN and do a WAL dump in the area of deadlocked startup.

I don't see how this might be a result of [1] and [2], so, perhaps, it's some more peculiarities
from [3]. But 16.12 has [3]...


Best regards, Andrey Borodin.


[0] https://www.postgresql.org/message-id/flat/CACV2tSw3VYS7d27ftO_cs%2BaF3M54%2BJwWBbqSGLcKoG9cvyb6EA%4...
[1] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=77dff5d937b192b85c55bc...
[2] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=23064542f8bdcbc4b6a513...
[3] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=6351669130782ed01eed3a...




^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:12  Marko Tiikkaja <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Marko Tiikkaja @ 2026-05-21 07:12 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]>

Hi Andrey,

On Thu, May 21, 2026 at 10:07 AM Andrey Borodin <[email protected]> wrote:
> Are you sure that it's not 16.11 that is resolving the problem?
> Can you get a backtrace of hanging startup process with debug symbols?

We had this problem just morning:

#0  __futex_abstimed_wait_common64 (private=<optimized out>,
cancel=true, abstime=0x0, op=265, expected=0,
futex_word=0x785c290170b8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized
out>, abstime=0x0, clockid=0, expected=0, futex_word=0x785c290170b8)
at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64
(futex_word=futex_word@entry=0x785c290170b8,
expected=expected@entry=0, clockid=clockid@entry=0,
abstime=abstime@entry=0x0,
    private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000786048c9cbdf in do_futex_wait (sem=sem@entry=0x785c290170b8,
abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000786048c9cc78 in __new_sem_wait_slow64
(sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at
./nptl/sem_waitcommon.c:183
#5  0x0000786048c9ccf1 in __new_sem_wait
(sem=sem@entry=0x785c290170b8) at ./nptl/sem_wait.c:42
#6  0x0000654c8b150b86 in PGSemaphoreLock (sema=0x785c290170b8) at
port/pg_sema.c:327
#7  LWLockAcquire (lock=0x785c29017a80, mode=LW_EXCLUSIVE) at
storage/lmgr/./build/../src/backend/storage/lmgr/lwlock.c:1314
#8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
<MultiXactOffsetCtlData.lto_priv.0>, allow_redirtied=<optimized out>)
    at access/transam/./build/../src/backend/access/transam/slru.c:1174
#9  0x0000654c8ae22719 in RecordNewMultiXact (multi=1201227775,
offset=2755202388, nmembers=2, members=0x7860465ec28c)
    at access/transam/./build/../src/backend/access/transam/multixact.c:944
#10 0x0000654c8ae255c6 in multixact_redo (record=0x654cb292c620) at
access/transam/./build/../src/backend/access/transam/multixact.c:3464
#11 0x0000654c8ae4ea2d in ApplyWalRecord (replayTLI=<synthetic
pointer>, record=0x7860465ec250, xlogreader=<optimized out>)
    at access/transam/./build/../src/include/access/xlog_internal.h:379
#12 PerformWalRecovery () at
access/transam/./build/../src/backend/access/transam/xlogrecovery.c:1782
#13 0x0000654c8ae3bcb7 in StartupXLOG () at
access/transam/./build/../src/backend/access/transam/xlog.c:5452
#14 0x0000654c8b0cbe7b in StartupProcessMain () at
postmaster/./build/../src/backend/postmaster/startup.c:282

We downgraded to 16.13 and the problem went away.


.m





^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:25  Andrey Borodin <[email protected]>
  parent: Marko Tiikkaja <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Andrey Borodin @ 2026-05-21 07:25 UTC (permalink / raw)
  To: Marko Tiikkaja <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]>

> On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> 
> #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400

Thanks!

This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
If by chance you will have a backtrace of another deadlocking process -
please post it.

But it's not strictly necessary for analysis, I think we can figure out what
happened from the backtrace you already posted.

Best regards, Andrey Borodin.

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 07:45  Ayush Tiwari <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Ayush Tiwari @ 2026-05-21 07:45 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; [email protected]; PostgreSQL mailing lists <[email protected]>

Hi,

On Thu, 21 May 2026 at 12:55, Andrey Borodin <[email protected]> wrote:

>
>
> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> >
> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>
> Thanks!
>
> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
> If by chance you will have a backtrace of another deadlocking process -
> please post it.
>
> But it's not strictly necessary for analysis, I think we can figure out
> what
> happened from the backtrace you already posted.
>

I had a look at the code that Marko's backtrace pointed at and I
believe this is a straightforward self-deadlock introduced by
77dff5d937b.

In RecordNewMultiXact() on REL_16_STABLE:

  LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);

  ...

  if (InRecovery && next_pageno != pageno)
  {
      ...
      if (last_initialized_offsets_page == -1)
      {
          SimpleLruWriteAll(MultiXactOffsetCtl, false);  /* <-- here */
          init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl,
next_pageno);
      }
      else
          init_needed = (last_initialized_offsets_page == pageno);
      ...
  }

The outer LWLockAcquire takes MultiXactOffsetSLRULock EXCLUSIVE.
SimpleLruWriteAll() in REL_16_STABLE then does

  LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

and for the MultiXactOffsetCtl SLRU, shared->ControlLock is
MultiXactOffsetSLRULock (set up by SimpleLruInit(...
MultiXactOffsetSLRULock ...)).
So it tries to take the very lock the same backend already holds.
LWLockAcquire does not detect that and parks the process on
LWLock:MultiXactOffsetSLRU forever.

That matches every datum in the report:

  -  wait_event = LWLock:MultiXactOffsetSLRU.
   - pg_stat_slru shows zero MultiXact activity, because the
    SimpleLruWriteAll loop never gets past LWLockAcquire to actually
    write a page.
  - Restart unwedges things briefly.
  - The deadlock only triggers when last_initialized_offsets_page is
    still -1, i.e. before any XLOG_MULTIXACT_ZERO_OFF_PAGE record has
    been replayed in this recovery session, which is at most once per
    startup and consistent with the "recurs after catch-up" behaviour.

The "safety flush" the comment justifies is it needed?
Every offsets page that this code path initializes is synchronously written
via SimpleLruWritePage() a few lines below the SimpleLruZeroPage(),
with an Assert that the page is clean afterwards.  So at the moment
we call SimpleLruDoesPhysicalPageExist(), there shouldn't be a relevant
dirty offsets page in the SLRU buffer cache that would lead to a
false negative.  Dropping the SimpleLruWriteAll() call therefore
removes the self-deadlock without changing correctness.

Maybe I'm missing something here. Thoughts?

Regards,
Ayush

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 08:34  Radim Marek <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: Radim Marek @ 2026-05-21 08:34 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Thank you for the follow-up. In mean-time I can confirm the
commit 77dff5d937b1 might be the source of the original reported issue.

Unfortunately pinning version down to 16.12 only avoids the
MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
after 12+ hours.

FATAL: could not access status of transaction 24958976 DETAIL: Could not
read from file "pg_multixact/offsets/017C" at offset 221184: read too few
bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
(keysh)

We are going to try to pin 16.13 and try that before we can safely upgrade
of the primary/are confident we have working PITR recovery available should
we need it.

Radim

PS: Once I have some time I will try to setup a docker based harness to be
able to replicate original problem for later testing of the fix.

On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote:

>
>
> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
> >
> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>
> Thanks!
>
> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
> If by chance you will have a backtrace of another deadlocking process -
> please post it.
>
> But it's not strictly necessary for analysis, I think we can figure out
> what
> happened from the backtrace you already posted.
>
>
> Best regards, Andrey Borodin.
>

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-21 09:06  Radim Marek <[email protected]>
  parent: Radim Marek <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Radim Marek @ 2026-05-21 09:06 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Altough the culprit is known, I've got more data as requested.

#0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
./build/src/backend/port/pg_sema.c:327
#4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
mode=mode@entry=LW_EXCLUSIVE) at
./build/../src/backend/storage/lmgr/lwlock.c:1314
#5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
<MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
./build/../src/backend/access/transam/slru.c:1174
#6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
nmembers=2, members=members@entry=0x7f20de6831ec) at
./build/../src/backend/access/transam/multixact.c:944
#7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
./build/../src/backend/access/transam/multixact.c:3464
#8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
./build/../src/backend/access/transam/xlogrecovery.c:1951
#9  PerformWalRecovery () at
./build/../src/backend/access/transam/xlogrecovery.c:1782
#10 0x000055a717740def in StartupXLOG () at
./build/../src/backend/access/transam/xlog.c:5452
#11 0x000055a71797c7e4 in StartupProcessMain () at
./build/../src/backend/postmaster/startup.c:282
#12 0x000055a717972b20 in AuxiliaryProcessMain
(auxtype=auxtype@entry=StartupProcess)
at ./build/../src/backend/postmaster/auxprocess.c:141
#13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
./build/../src/backend/postmaster/postmaster.c:5381
#14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x55a73a8d0590)
at ./build/../src/backend/postmaster/postmaster.c:1463
#15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
./build/../src/backend/main/main.c:200

and WAL dump

rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
1663/16384/16432 blk 536
rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
2: 336089 (keysh)
336098 (keysh)
rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
[IS_MULTI, LOCK_ONLY,
KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
52, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk
22
rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
149, old_infobits: [],
flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk
6
rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
rel 1663/16384/16417
blk 741
rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
243, old_infobits: [],
flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk
26
rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC

Radim

On Thu, 21 May 2026 at 10:34, Radim Marek <[email protected]> wrote:

> Thank you for the follow-up. In mean-time I can confirm the
> commit 77dff5d937b1 might be the source of the original reported issue.
>
> Unfortunately pinning version down to 16.12 only avoids the
> MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery
> after 12+ hours.
>
> FATAL: could not access status of transaction 24958976 DETAIL: Could not
> read from file "pg_multixact/offsets/017C" at offset 221184: read too few
> bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID:
> 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189
> (keysh)
>
> We are going to try to pin 16.13 and try that before we can safely upgrade
> of the primary/are confident we have working PITR recovery available should
> we need it.
>
> Radim
>
> PS: Once I have some time I will try to setup a docker based harness to be
> able to replicate original problem for later testing of the fix.
>
> On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote:
>
>>
>>
>> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote:
>> >
>> > #8  0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400
>>
>> Thanks!
>>
>> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
>> If by chance you will have a backtrace of another deadlocking process -
>> please post it.
>>
>> But it's not strictly necessary for analysis, I think we can figure out
>> what
>> happened from the backtrace you already posted.
>>
>>
>> Best regards, Andrey Borodin.
>>
>


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-22 16:51  Ayush Tiwari <[email protected]>
  parent: Radim Marek <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ayush Tiwari @ 2026-05-22 16:51 UTC (permalink / raw)
  To: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Hi,

On Thu, 21 May 2026 at 14:36, Radim Marek <[email protected]> wrote:

> Altough the culprit is known, I've got more data as requested.
>
> #0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
> ./build/src/backend/port/pg_sema.c:327
> #4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
> mode=mode@entry=LW_EXCLUSIVE) at
> ./build/../src/backend/storage/lmgr/lwlock.c:1314
> #5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
> <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
> ./build/../src/backend/access/transam/slru.c:1174
> #6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
> nmembers=2, members=members@entry=0x7f20de6831ec) at
> ./build/../src/backend/access/transam/multixact.c:944
> #7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
> ./build/../src/backend/access/transam/multixact.c:3464
> #8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
> record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
> ./build/../src/backend/access/transam/xlogrecovery.c:1951
> #9  PerformWalRecovery () at
> ./build/../src/backend/access/transam/xlogrecovery.c:1782
> #10 0x000055a717740def in StartupXLOG () at
> ./build/../src/backend/access/transam/xlog.c:5452
> #11 0x000055a71797c7e4 in StartupProcessMain () at
> ./build/../src/backend/postmaster/startup.c:282
> #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess)
> at ./build/../src/backend/postmaster/auxprocess.c:141
> #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
> ./build/../src/backend/postmaster/postmaster.c:5381
> #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
> argv=argv@entry=0x55a73a8d0590) at
> ./build/../src/backend/postmaster/postmaster.c:1463
> #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
> ./build/../src/backend/main/main.c:200
>
> and WAL dump
>
> rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
> 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
> 1663/16384/16432 blk 536
> rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
> 2: 336089 (keysh)
> 336098 (keysh)
> rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
> [IS_MULTI, LOCK_ONLY,
> KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
> 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 52, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401
> blk 22
> rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
> 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 149, old_infobits: [],
> flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399
> blk 6
> rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
> 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
> rel 1663/16384/16417
> blk 741
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
> 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
> 243, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401
> blk 26
> rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
> 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC
>
> Radim
>

Thanks for the additional backtrace and WAL dump.  That makes the failure
mode much clearer.

The latest trace shows the startup process here:

  SimpleLruWriteAll(MultiXactOffsetCtl, false)
  RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...)
  multixact_redo()

The WAL dump also shows the matching record:

  rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2

79871 is the last multixact on its offsets page, so replaying that record
enters the next_pageno != pageno compatibility path added by 77dff5d937b.

On REL_14 through REL_16, RecordNewMultiXact() already holds
MultiXactOffsetSLRULock while executing that code.  SimpleLruWriteAll() then
tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same
MultiXactOffsetSLRULock on those branches.  That explains the standby
startup
process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding
SLRU I/O activity.

I think the right fix is to remove that SimpleLruWriteAll() call while
keeping the missing-page initialization logic.  The flush is only meant to
make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
but have not reached disk.  In this fallback path, I don't see a way for
the tested next_pageno to be in that state: if RecordNewMultiXact() itself
initializes the page, it writes it synchronously with SimpleLruWritePage()
before setting last_initialized_offsets_page.

I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
is also present on PG 14 and 15.  PG 17 and
18 have the same compatibility call, but SLRU locking is banked
there, and RecordNewMultiXact() does not appear to hold the relevant bank
lock before calling SimpleLruWriteAll(), so I would not describe those
branches as having this exact self-deadlock, but needs more analysis.

Added both Andrey and Heikki in to-mail, since I'm not sure if this
is more extreme than the multixact offset issue we had with 16.12, or it
is at par with that.

Regards,
Ayush


Attachments:

  [application/octet-stream] v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch (2.5K, 3-v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch)
  download | inline diff:
From b33abeede0847edac3603b87a478a832be1784f8 Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <[email protected]>
Date: Thu, 21 May 2026 07:39:28 +0000
Subject: [PATCH REL_16_STABLE v1] Avoid self-deadlock on
 MultiXactOffsetSLRULock during WAL replay

Commit 77dff5d937b added a compatibility check in RecordNewMultiXact()
that can call SimpleLruWriteAll(MultiXactOffsetCtl, false) while already
holding MultiXactOffsetSLRULock.  In REL_16, SimpleLruWriteAll() tries
to acquire the same SLRU control lock, so WAL replay can self-deadlock
with the startup process waiting on LWLock:MultiXactOffsetSLRU.

The flush is not needed for the page tested in this fallback path.  If
RecordNewMultiXact() initializes that offsets page, it writes it
synchronously with SimpleLruWritePage() before updating
last_initialized_offsets_page.  Drop the unsafe flush and keep the
existing missing-page initialization logic.

Reported-by: Radim Marek <[email protected]>
Reported-by: Marko Tiikkaja <[email protected]>
Diagnosed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
 src/backend/access/transam/multixact.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f825579e888..5b6b48eb79c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -934,16 +934,17 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		 * seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, which should
 		 * happen at most once after starting WAL recovery.
 		 *
-		 * As an extra safety measure, if we do resort to
-		 * SimpleLruDoesPhysicalPageExist(), flush the SLRU buffers first so
-		 * that it will return an accurate result.
+		 *
+		 * We cannot call SimpleLruWriteAll() to flush the SLRU buffers
+		 * here, because that would self-deadlock on MultiXactOffsetSLRULock,
+		 * which we already hold.  Fortunately we do not need to: every
+		 * page that this code path initializes is synchronously flushed via
+		 * SimpleLruWritePage() below before this lock is released, so there
+		 * are no relevant dirty pages.
 		 *----------
 		 */
 		if (last_initialized_offsets_page == -1)
-		{
-			SimpleLruWriteAll(MultiXactOffsetCtl, false);
 			init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno);
-		}
 		else
 			init_needed = (last_initialized_offsets_page == pageno);
 
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 08:02  Michael Paquier <[email protected]>
  parent: Ayush Tiwari <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Michael Paquier @ 2026-05-26 08:02 UTC (permalink / raw)
  To: Ayush Tiwari <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote:
> I think the right fix is to remove that SimpleLruWriteAll() call while
> keeping the missing-page initialization logic.  The flush is only meant to
> make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
> but have not reached disk.  In this fallback path, I don't see a way for
> the tested next_pageno to be in that state: if RecordNewMultiXact() itself
> initializes the page, it writes it synchronously with SimpleLruWritePage()
> before setting last_initialized_offsets_page.

FWIW, I'm having a couple of customers complaining about that as well,
as cross-version physical replication is a thing for minor upgrade
flows.  This bug is making suddenly recovery disruptive for some folks
out there.  :(

> I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
> is also present on PG 14 and 15.  PG 17 and
> 18 have the same compatibility call, but SLRU locking is banked
> there, and RecordNewMultiXact() does not appear to hold the relevant bank
> lock before calling SimpleLruWriteAll(), so I would not describe those
> branches as having this exact self-deadlock, but needs more analysis.

So your root argument is that while the SimpleLruWriteAll() is
defensive, it is not actually necessary because it means that
last_initialized_offsets_page is -1 we have not yet replayed
ZERO_OFF_PAGE and that we have no dirty page that could make
SimpleLruDoesPhysicalPageExis() return an incorrect result, which
would be bad.  I am not sure to agree that this assumption is correct
all the time, see for example the WAL message mentioned in the thread
that has led to 77dff5d937b1:
https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi

I can see mentioned this WAL sequence, which is possible because there
is no strict ordering in the creation of the mxacts:
ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047

Based on that, if we begin recovery after ZERO_PAGE:2048, we could
finish with this kind of sequence:
CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047

Looking closer, last_initialized_offsets_page stays at -1.  The page
for 2048 was zeroed before the checkpoint by the earlier
ZERO_PAGE:2048.  CREATE_ID:2048 and CREATE_ID:2049 are created first.
Then comes CREATE_ID:2047 which enters the
last_initialized_offsets_page branch.  If we don't have the WriteAll(),
the page where the offsets of 2048 and 2049 are located gets zeroed
while creating 2047, corrupting the existing state of 2048 and 2049.

A different approach would be to release and re-acquire the
MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think
that it should be actually safe.  Even if read-only backends evict
dirty pages between the moment the lock is released and the moment it
is re-acquired in SimpleLruWriteAll(), the pages would be would be
written to disk due to the eviction, which is what we want for
correctness.  And only the startup process dirties offset pages during
recovery, AFAIK.  Thoughts?

> Added both Andrey and Heikki in to-mail, since I'm not sure if this
> is more extreme than the multixact offset issue we had with 16.12, or it
> is at par with that.

Indeed, let's wait for at least Heikki's input.  

Anyway, for any fixes, I don't think that it would be a good idea to
skip v17 and v18, relying on the SLRU bank locks to not conflict to
bypass the WriteAll() conflict.  Let's keep all the branches across
v14~v18 in sync.
--
Michael

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 08:30  Ayush Tiwari <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ayush Tiwari @ 2026-05-26 08:30 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Hi,

On Tue, 26 May 2026 at 13:32, Michael Paquier <[email protected]> wrote:

> On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote:
> > I think the right fix is to remove that SimpleLruWriteAll() call while
> > keeping the missing-page initialization logic.  The flush is only meant
> to
> > make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU
> buffers
> > but have not reached disk.  In this fallback path, I don't see a way for
> > the tested next_pageno to be in that state: if RecordNewMultiXact()
> itself
> > initializes the page, it writes it synchronously with
> SimpleLruWritePage()
> > before setting last_initialized_offsets_page.
>
> FWIW, I'm having a couple of customers complaining about that as well,
> as cross-version physical replication is a thing for minor upgrade
> flows.  This bug is making suddenly recovery disruptive for some folks
> out there.  :(
>

We had faced a lot of replicas in bad state due to multixact replay with
16.12 release, and had to revert back the minor versions for them  until
16.13 came out which was a blessing. Given the number of CVEs
current one fixes, reverting too is scary.

> > I attached a small patch for REL_16_STABLE.  The same self-deadlock
> pattern
> > is also present on PG 14 and 15.  PG 17 and
> > 18 have the same compatibility call, but SLRU locking is banked
> > there, and RecordNewMultiXact() does not appear to hold the relevant bank
> > lock before calling SimpleLruWriteAll(), so I would not describe those
> > branches as having this exact self-deadlock, but needs more analysis.
>
> So your root argument is that while the SimpleLruWriteAll() is
> defensive, it is not actually necessary because it means that
> last_initialized_offsets_page is -1 we have not yet replayed
> ZERO_OFF_PAGE and that we have no dirty page that could make
> SimpleLruDoesPhysicalPageExis() return an incorrect result, which
> would be bad.  I am not sure to agree that this assumption is correct
> all the time, see for example the WAL message mentioned in the thread
> that has led to 77dff5d937b1:
>
> https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi

Right, agreed.  Thanks for pointing to that case.  My v1 patch removes
the self-deadlock, but the "no relevant dirty pages" assumption is too
strong.

The dirty page does not have to be one initialized by the current
RecordNewMultiXact() call.  It can already contain offsets replayed from
later CREATE_ID records while last_initialized_offsets_page is still -1.
In that state, relying directly on SimpleLruDoesPhysicalPageExist() can
still produce a false negative because it only checks the physical file,
not dirty SLRU buffers.  So removing the flush can maybe reintroduce
the kind of corruption that 77dff5d937b1 was trying to prevent.

A different approach would be to release and re-acquire the
> MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think
> that it should be actually safe.  Even if read-only backends evict
> dirty pages between the moment the lock is released and the moment it
> is re-acquired in SimpleLruWriteAll(), the pages would be would be
> written to disk due to the eviction, which is what we want for
> correctness.  And only the startup process dirties offset pages during
> recovery, AFAIK.  Thoughts?
>

 That sounds like the right direction to me.

Releasing MultiXactOffsetSLRULock around SimpleLruWriteAll() preserves
the flush-before-physical-check rule while avoiding the self-deadlock.
I don't see a partial-state problem from the current record at that
point, since the compatibility check happens before RecordNewMultiXact()
has modified the current offsets page.  And as you said, during recovery
The startup process should be the only process dirtying offset pages; if
a hot standby reader causes eviction while the lock is released, that
should only help by writing the dirty page out.

> Added both Andrey and Heikki in to-mail, since I'm not sure if this
> > is more extreme than the multixact offset issue we had with 16.12, or it
> > is at par with that.
>
> Indeed, let's wait for at least Heikki's input.
>
> Anyway, for any fixes, I don't think that it would be a good idea to
> skip v17 and v18, relying on the SLRU bank locks to not conflict to
> bypass the WriteAll() conflict.  Let's keep all the branches across
> v14~v18 in sync.
>

Agreed.

Regards,
Ayush

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 08:41  Andrey Borodin <[email protected]>
  parent: Ayush Tiwari <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Andrey Borodin @ 2026-05-26 08:41 UTC (permalink / raw)
  To: Ayush Tiwari <[email protected]>; +Cc: Michael Paquier <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>



> On 26 May 2026, at 13:30, Ayush Tiwari <[email protected]> wrote:
> 
> Releasing MultiXactOffsetSLRULock around SimpleLruWriteAll() preserves
> the flush-before-physical-check rule while avoiding the self-deadlock.

I think we don't need to release lock, we just need to acquire it later, as it is done
in 17+ branches.

FWIW I'm working on buildfarm module that will recovery regress WAL from
REL_x_0 through replay by REL_x_STABLE.


Best regards, Andrey Borodin.




^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 09:27  Michael Paquier <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Michael Paquier @ 2026-05-26 09:27 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On Tue, May 26, 2026 at 01:41:03PM +0500, Andrey Borodin wrote:
> I think we don't need to release lock, we just need to acquire it later, as it is done
> in 17+ branches.

Hmm, okay.  I am not sure what you mean here, could you demonstrate
your idea with a patch later?
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 09:33  Andrey Borodin <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Andrey Borodin @ 2026-05-26 09:33 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>



> On 26 May 2026, at 14:27, Michael Paquier <[email protected]> wrote:
> 
> Hmm, okay.  I am not sure what you mean here, could you demonstrate
> your idea with a patch later?

Something like attached, not tested yet, working on an automated test.


Best regards, Andrey Borodin.


Attachments:

  [application/octet-stream] demo.diff (1.9K, 2-demo.diff)
  download | inline diff:
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f825579e888..8899d5ac63d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -888,8 +888,6 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	MultiXactOffset *next_offptr;
 	MultiXactOffset next_offset;
 
-	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
-
 	/* position of this multixid in the offsets SLRU area  */
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
@@ -907,6 +905,9 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 	 * multixid was assigned.  If we're replaying WAL that was generated by
 	 * such a version, the next page might not be initialized yet.  Initialize
 	 * it now.
+	 *
+	 * This block runs before acquiring MultiXactOffsetSLRULock because
+	 * SimpleLruWriteAll() needs to acquire the same lock internally.
 	 */
 	if (InRecovery && next_pageno != pageno)
 	{
@@ -951,6 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		{
 			elog(DEBUG1, "next offsets page is not initialized, initializing it now");
 
+			LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+
 			/* Create and zero the page */
 			slotno = SimpleLruZeroPage(MultiXactOffsetCtl, next_pageno);
 
@@ -958,6 +961,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 			SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 			Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
+			LWLockRelease(MultiXactOffsetSLRULock);
+
 			/*
 			 * Remember that we initialized the page, so that we don't zero it
 			 * again at the XLOG_MULTIXACT_ZERO_OFF_PAGE record.
@@ -967,6 +972,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		}
 	}
 
+	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
+
 	/*
 	 * Set the starting offset of this multixid's members.
 	 *


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 12:28  Heikki Linnakangas <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Heikki Linnakangas @ 2026-05-26 12:28 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; Michael Paquier <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On 26/05/2026 12:33, Andrey Borodin wrote:
>> On 26 May 2026, at 14:27, Michael Paquier <[email protected]> wrote:
>>
>> Hmm, okay.  I am not sure what you mean here, could you demonstrate
>> your idea with a patch later?
> 
> Something like attached, not tested yet, working on an automated test.

Yeah, that looks correct to me. It moves the locking on v16 to where it 
happens on v17 and v18. I don't see any reason to hold the lock in the 
earlier parts of RecordNewMultiXact() in v16.

- Heikki






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-26 18:29  Andrey Borodin <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Andrey Borodin @ 2026-05-26 18:29 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

> On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote:
> 
> looks correct

I tested that change as follows.

Setted up REL_16_0 as primary, REL_16_STABLE as standby.

Generate multixacts in a single session using savepoints:

BEGIN;
SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
-- repeat 2500 times:
SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
COMMIT;

Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU page
boundary at multixact 2048 with some spare multis (we'll pickle the excess ones in
jars when all is fixed, toying with 2048 wasted dev cycles for no reason).

Test:
0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page 0->1)
1. Take pg_basebackup
2. Run the workload again (2500 more, crossing page 1->2)
3. Start the standby

I observe:
Without the change startup deadlocks.
With the change standby catches up, the DEBUG1 message "next offsets page is not
initialized, initializing it now" confirms the compat block fires correctly.

I packaged this test into a buildfarm module (TestReplayXversion) [0] that
builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces the deadlock
on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject regress WAL trace
into it, not working so far. On a bright side - I managed to get PR number 42 in buildfarm
client repo.

Best regards, Andrey Borodin.

[0] https://github.com/PGBuildFarm/client-code/pull/42

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-27 00:30  Michael Paquier <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Michael Paquier @ 2026-05-27 00:30 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On Tue, May 26, 2026 at 11:29:58PM +0500, Andrey Borodin wrote:
> On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote:
>> looks correct

Neither do I see an issue in doing the first steps of
RecordNewMultiXact() without holding the lock.  The consistency that
we get across all the stable branches after this patch makes the whole
logic neater.

> I observe:
> Without the change startup deadlocks.
> With the change standby catches up, the DEBUG1 message "next offsets page is not
> initialized, initializing it now" confirms the compat block fires correctly.

Cool, thanks for the patch and double-checking things, Andrey!  I did
not check the fix beyond a check-world (aka no cross-version replay
done here), but looking closely through the code I don't immediately
see why this would be wrong across the v14~v16 range.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-27 02:55  Nazneen Jafri <[email protected]>
  parent: Andrey Borodin <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: Nazneen Jafri @ 2026-05-27 02:55 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

Tested Andrey's demo.diff on a fresh environment:



  - Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert)

  - ~2300 MultiXacts crossing the offsets page boundary

  - Without patch: startup deadlocks at RecordNewMultiXact(multi=2047)

  - With patch: standby replays all WAL and catches up


Thanks,
Nazneen

On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <[email protected]> wrote:

>
>
> > On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote:
> >
> > looks correct
>
> I tested that change as follows.
>
> Setted up REL_16_0 as primary, REL_16_STABLE as standby.
>
> Generate multixacts in a single session using savepoints:
>
> BEGIN;
> SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
> -- repeat 2500 times:
> SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
> COMMIT;
>
> Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU
> page
> boundary at multixact 2048 with some spare multis (we'll pickle the excess
> ones in
> jars when all is fixed, toying with 2048 wasted dev cycles for no reason).
>
> Test:
> 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page
> 0->1)
> 1. Take pg_basebackup
> 2. Run the workload again (2500 more, crossing page 1->2)
> 3. Start the standby
>
> I observe:
> Without the change startup deadlocks.
> With the change standby catches up, the DEBUG1 message "next offsets page
> is not
> initialized, initializing it now" confirms the compat block fires
> correctly.
>
> I packaged this test into a buildfarm module (TestReplayXversion) [0] that
> builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces
> the deadlock
> on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject
> regress WAL trace
> into it, not working so far. On a bright side - I managed to get PR number
> 42 in buildfarm
> client repo.
>
>
> Best regards, Andrey Borodin.
>
> [0] https://github.com/PGBuildFarm/client-code/pull/42
>
>
>
>
>
>


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-27 09:06  Heikki Linnakangas <[email protected]>
  parent: Nazneen Jafri <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Heikki Linnakangas @ 2026-05-27 09:06 UTC (permalink / raw)
  To: Nazneen Jafri <[email protected]>; Andrey Borodin <[email protected]>; +Cc: Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On 27/05/2026 05:55, Nazneen Jafri wrote:
> Tested Andrey's demo.diff on a fresh environment:
> 
>    - Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert)
> 
>    - ~2300 MultiXacts crossing the offsets page boundary
> 
>    - Without patch: startup deadlocks at RecordNewMultiXact(multi=2047)
> 
>    - With patch: standby replays all WAL and catches up

Thanks all. I have applied this to v14 - v16.

- Heikki






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-27 12:08  Andrey Borodin <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Andrey Borodin @ 2026-05-27 12:08 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Nazneen Jafri <[email protected]>; Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

> On 27 May 2026, at 14:06, Heikki Linnakangas <[email protected]> wrote:
> 
> I have applied this to v14 - v16.

Thanks!

I can confirm that all 5 branches are now passing new buildfarm test module.
While 14-16 were failing it this morning.

I'll try to get this test module to usable state and enable on my animal.
Interestingly, "make installcheck" regress trace was not triggering WAL incompatibility,
so I this module is not "make installcheck" + "special multixact workload" [0].
I'm not sure it is useful for finding other similar bugs...

Best regards, Andrey Borodin.

[0] https://github.com/PGBuildFarm/client-code/pull/42/changes#diff-588541281f9511e15c02bc6718535cf7cd28...

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
@ 2026-05-28 01:12  Michael Paquier <[email protected]>
  parent: Heikki Linnakangas <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Michael Paquier @ 2026-05-28 01:12 UTC (permalink / raw)
  To: Heikki Linnakangas <[email protected]>; +Cc: Nazneen Jafri <[email protected]>; Andrey Borodin <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]>

On Wed, May 27, 2026 at 12:06:45PM +0300, Heikki Linnakangas wrote:
> Thanks all. I have applied this to v14 - v16.

Thanks for applying the fix.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 20+ messages in thread

end of thread, other threads:[~2026-05-28 01:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-20 21:16 BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 PG Bug reporting form <[email protected]>
2026-05-21 07:07 ` Andrey Borodin <[email protected]>
2026-05-21 07:12   ` Marko Tiikkaja <[email protected]>
2026-05-21 07:25     ` Andrey Borodin <[email protected]>
2026-05-21 07:45       ` Ayush Tiwari <[email protected]>
2026-05-21 08:34       ` Radim Marek <[email protected]>
2026-05-21 09:06         ` Radim Marek <[email protected]>
2026-05-22 16:51           ` Ayush Tiwari <[email protected]>
2026-05-26 08:02             ` Michael Paquier <[email protected]>
2026-05-26 08:30               ` Ayush Tiwari <[email protected]>
2026-05-26 08:41                 ` Andrey Borodin <[email protected]>
2026-05-26 09:27                   ` Michael Paquier <[email protected]>
2026-05-26 09:33                     ` Andrey Borodin <[email protected]>
2026-05-26 12:28                       ` Heikki Linnakangas <[email protected]>
2026-05-26 18:29                         ` Andrey Borodin <[email protected]>
2026-05-27 00:30                           ` Michael Paquier <[email protected]>
2026-05-27 02:55                           ` Nazneen Jafri <[email protected]>
2026-05-27 09:06                             ` Heikki Linnakangas <[email protected]>
2026-05-27 12:08                               ` Andrey Borodin <[email protected]>
2026-05-28 01:12                               ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox