public inbox for [email protected]help / color / mirror / Atom feed
BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 20+ messages / 8 participants [nested] [flat]
* BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-20 21:16 PG Bug reporting form <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: PG Bug reporting form @ 2026-05-20 21:16 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] The following bug has been logged on the website: Bug reference: 19490 Logged by: Radim Marek Email address: [email protected] PostgreSQL version: 16.14 Operating system: Linux - Ubuntu 22.04 Description: Hello, due to a mistake we have run a higher minor version of 16.x against the non-upgraded primary. This led to repeated issues on WAL processing. Description: A streaming replication standby running 16.14 stops advancing replay while WAL keeps arriving from a 16.8 primary. The startup process is parked in futex_wait_queue with wait_event = LWLock:MultiXactOffsetSLRU and no longer makes progress. pg_stat_slru shows zero MultiXact activity over the same window, so it appears to stop on the lock itself rather than inside any SLRU read/write path. Downgrading the standby binary to 16.12 (same data directory) resolved the symptom under the same workload. Configuration: Primary running 16.8-1.pgdg22.04+1, we observed both loaded and "relatively" idle (below 1000 QPS) Replica: 16.14-1.pgdg22.04+1, physical streaming, async, single replica on 16.14 due to misconfiguration, no cascading. Other replicas not affected (running 16.8). hot_standby_feedback enabled, logical replication from primary. default WAL segment size. Default SLRU buffer sizes. Observed symptoms on the standby 1. pg_stat_replication on primary, just the affected node client_addr state sent_lag write_lag flush_lag replay_lag_bytes replay_lag 10.x.x.x streaming 0 0 0 8766784344 02:42:50 2. Receive/write/flush all at the primary's current LSN; only replay is far behind and growing. 3. Startup process wait event on standby (sampled repeatedly, always identical)pid wait_event_type wait_event state 19095 LWLock MultiXactOffsetSLRU (null) 4. Kernel stack of the startup process cat /proc/19095/stack [<0>] futex_wait_queue+0x67/0xa0 [<0>] __futex_wait+0x155/0x1d0 [<0>] futex_wait+0x74/0x120 [<0>] do_futex+0x16d/0x230 [<0>] __x64_sys_futex+0x95/0x200 [<0>] x64_sys_call+0x117b/0x2480 [<0>] do_syscall_64+0x81/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 cat /proc/19095/wchan futex_wait_queue 5. pg_stat_slru on the standby, after pg_stat_reset_slru(NULL) and a 60-second wait under live WAL streaming name blks_zeroed blks_hit blks_read blks_written MultiXactMember 0 0 0 0 MultiXactOffset 0 0 0 0 6. There was no MultiXact SLRU activity while the startup process is reportedly waiting on the MultiXact offset SLRU lock. 7. Replay LSN frozen, receive LSN advancing. Sampled 60 sec apart. recv replay lag_bytes 1476A/D1DA158 14767/EE01DB78 9111848416 1476A/EB565D0 14767/EE01DB78 9138571864 8. No replay progress; ~9 GB of WAL buffered locally that is never applied. 6. Other backends on the standby: only a diagnostic psql client. No hot-standby readers. 7. MultiXact age on the primary is small (~360k on most DBs, ~239k on the main DB). No MultiXact storm. Workarounds - Restarting the standby cleared the block but once it caught up it repeated again- Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against the same data directory restored normal replay. After 60s under the same workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact. I understand that running 6 minor versions behind is not particulary good setup, but given this being supported direction this might be worth at least in 16.13/16.14 release notes. --- Hope this helps, Radim ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:07 Andrey Borodin <[email protected]> parent: PG Bug reporting form <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Andrey Borodin @ 2026-05-21 07:07 UTC (permalink / raw) To: [email protected]; PostgreSQL mailing lists <[email protected]> Thanks for the report! Oh, this seems to be from the "gift that keeps on giving" department. Related to [0] > On 20 May 2026, at 14:16, PG Bug reporting form <[email protected]> wrote: > > Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against > the same data directory restored normal replay. After 60s under the same > workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact. Are you sure that it's not 16.11 that is resolving the problem? Can you get a backtrace of hanging startup process with debug symbols? Or obtain last replayed LSN and do a WAL dump in the area of deadlocked startup. I don't see how this might be a result of [1] and [2], so, perhaps, it's some more peculiarities from [3]. But 16.12 has [3]... Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/CACV2tSw3VYS7d27ftO_cs%2BaF3M54%2BJwWBbqSGLcKoG9cvyb6EA%4... [1] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=77dff5d937b192b85c55bc... [2] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=23064542f8bdcbc4b6a513... [3] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=6351669130782ed01eed3a... ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:12 Marko Tiikkaja <[email protected]> parent: Andrey Borodin <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Marko Tiikkaja @ 2026-05-21 07:12 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]> Hi Andrey, On Thu, May 21, 2026 at 10:07 AM Andrey Borodin <[email protected]> wrote: > Are you sure that it's not 16.11 that is resolving the problem? > Can you get a backtrace of hanging startup process with debug symbols? We had this problem just morning: #0 __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=265, expected=0, futex_word=0x785c290170b8) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x785c290170b8) at ./nptl/futex-internal.c:87 #2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x785c290170b8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139 #3 0x0000786048c9cbdf in do_futex_wait (sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111 #4 0x0000786048c9cc78 in __new_sem_wait_slow64 (sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183 #5 0x0000786048c9ccf1 in __new_sem_wait (sem=sem@entry=0x785c290170b8) at ./nptl/sem_wait.c:42 #6 0x0000654c8b150b86 in PGSemaphoreLock (sema=0x785c290170b8) at port/pg_sema.c:327 #7 LWLockAcquire (lock=0x785c29017a80, mode=LW_EXCLUSIVE) at storage/lmgr/./build/../src/backend/storage/lmgr/lwlock.c:1314 #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 <MultiXactOffsetCtlData.lto_priv.0>, allow_redirtied=<optimized out>) at access/transam/./build/../src/backend/access/transam/slru.c:1174 #9 0x0000654c8ae22719 in RecordNewMultiXact (multi=1201227775, offset=2755202388, nmembers=2, members=0x7860465ec28c) at access/transam/./build/../src/backend/access/transam/multixact.c:944 #10 0x0000654c8ae255c6 in multixact_redo (record=0x654cb292c620) at access/transam/./build/../src/backend/access/transam/multixact.c:3464 #11 0x0000654c8ae4ea2d in ApplyWalRecord (replayTLI=<synthetic pointer>, record=0x7860465ec250, xlogreader=<optimized out>) at access/transam/./build/../src/include/access/xlog_internal.h:379 #12 PerformWalRecovery () at access/transam/./build/../src/backend/access/transam/xlogrecovery.c:1782 #13 0x0000654c8ae3bcb7 in StartupXLOG () at access/transam/./build/../src/backend/access/transam/xlog.c:5452 #14 0x0000654c8b0cbe7b in StartupProcessMain () at postmaster/./build/../src/backend/postmaster/startup.c:282 We downgraded to 16.13 and the problem went away. .m ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:25 Andrey Borodin <[email protected]> parent: Marko Tiikkaja <[email protected]> 0 siblings, 2 replies; 20+ messages in thread From: Andrey Borodin @ 2026-05-21 07:25 UTC (permalink / raw) To: Marko Tiikkaja <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 Thanks! This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. If by chance you will have a backtrace of another deadlocking process - please post it. But it's not strictly necessary for analysis, I think we can figure out what happened from the backtrace you already posted. Best regards, Andrey Borodin. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:45 Ayush Tiwari <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 0 replies; 20+ messages in thread From: Ayush Tiwari @ 2026-05-21 07:45 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; [email protected]; PostgreSQL mailing lists <[email protected]> Hi, On Thu, 21 May 2026 at 12:55, Andrey Borodin <[email protected]> wrote: > > > > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 > > Thanks! > > This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. > If by chance you will have a backtrace of another deadlocking process - > please post it. > > But it's not strictly necessary for analysis, I think we can figure out > what > happened from the backtrace you already posted. > I had a look at the code that Marko's backtrace pointed at and I believe this is a straightforward self-deadlock introduced by 77dff5d937b. In RecordNewMultiXact() on REL_16_STABLE: LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); ... if (InRecovery && next_pageno != pageno) { ... if (last_initialized_offsets_page == -1) { SimpleLruWriteAll(MultiXactOffsetCtl, false); /* <-- here */ init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno); } else init_needed = (last_initialized_offsets_page == pageno); ... } The outer LWLockAcquire takes MultiXactOffsetSLRULock EXCLUSIVE. SimpleLruWriteAll() in REL_16_STABLE then does LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE); and for the MultiXactOffsetCtl SLRU, shared->ControlLock is MultiXactOffsetSLRULock (set up by SimpleLruInit(... MultiXactOffsetSLRULock ...)). So it tries to take the very lock the same backend already holds. LWLockAcquire does not detect that and parks the process on LWLock:MultiXactOffsetSLRU forever. That matches every datum in the report: - wait_event = LWLock:MultiXactOffsetSLRU. - pg_stat_slru shows zero MultiXact activity, because the SimpleLruWriteAll loop never gets past LWLockAcquire to actually write a page. - Restart unwedges things briefly. - The deadlock only triggers when last_initialized_offsets_page is still -1, i.e. before any XLOG_MULTIXACT_ZERO_OFF_PAGE record has been replayed in this recovery session, which is at most once per startup and consistent with the "recurs after catch-up" behaviour. The "safety flush" the comment justifies is it needed? Every offsets page that this code path initializes is synchronously written via SimpleLruWritePage() a few lines below the SimpleLruZeroPage(), with an Assert that the page is clean afterwards. So at the moment we call SimpleLruDoesPhysicalPageExist(), there shouldn't be a relevant dirty offsets page in the SLRU buffer cache that would lead to a false negative. Dropping the SimpleLruWriteAll() call therefore removes the self-deadlock without changing correctness. Maybe I'm missing something here. Thoughts? Regards, Ayush ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 08:34 Radim Marek <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 1 reply; 20+ messages in thread From: Radim Marek @ 2026-05-21 08:34 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Thank you for the follow-up. In mean-time I can confirm the commit 77dff5d937b1 might be the source of the original reported issue. Unfortunately pinning version down to 16.12 only avoids the MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery after 12+ hours. FATAL: could not access status of transaction 24958976 DETAIL: Could not read from file "pg_multixact/offsets/017C" at offset 221184: read too few bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 (keysh) We are going to try to pin 16.13 and try that before we can safely upgrade of the primary/are confident we have working PITR recovery available should we need it. Radim PS: Once I have some time I will try to setup a docker based harness to be able to replicate original problem for later testing of the fix. On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote: > > > > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 > > Thanks! > > This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. > If by chance you will have a backtrace of another deadlocking process - > please post it. > > But it's not strictly necessary for analysis, I think we can figure out > what > happened from the backtrace you already posted. > > > Best regards, Andrey Borodin. > ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 09:06 Radim Marek <[email protected]> parent: Radim Marek <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Radim Marek @ 2026-05-21 09:06 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Altough the culprit is known, I've got more data as requested. #0 0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at ./build/src/backend/port/pg_sema.c:327 #4 0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, mode=mode@entry=LW_EXCLUSIVE) at ./build/../src/backend/storage/lmgr/lwlock.c:1314 #5 0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at ./build/../src/backend/access/transam/slru.c:1174 #6 0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, nmembers=2, members=members@entry=0x7f20de6831ec) at ./build/../src/backend/access/transam/multixact.c:944 #7 0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464 #8 0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at ./build/../src/backend/access/transam/xlogrecovery.c:1951 #9 PerformWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:1782 #10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:5452 #11 0x000055a71797c7e4 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:282 #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) at ./build/../src/backend/postmaster/auxprocess.c:141 #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5381 #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x55a73a8d0590) at ./build/../src/backend/postmaster/postmaster.c:1463 #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at ./build/../src/backend/main/main.c:200 and WAL dump rmgr: Btree len (rec/tot): 64/ 64, tx: 336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel 1663/16384/16432 blk 536 rmgr: MultiXact len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089 (keysh) 336098 (keysh) rmgr: Heap len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_MULTI, LOCK_ONLY, KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0 rmgr: Heap len (rec/tot): 72/ 72, tx: 336096, lsn: 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk 22 rmgr: Heap len (rec/tot): 71/ 71, tx: 336096, lsn: 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits: [], flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk 6 rmgr: Heap len (rec/tot): 79/ 79, tx: 336096, lsn: 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/16417 blk 741 rmgr: Heap len (rec/tot): 72/ 72, tx: 336097, lsn: 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk 26 rmgr: Transaction len (rec/tot): 34/ 34, tx: 336096, lsn: 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC Radim On Thu, 21 May 2026 at 10:34, Radim Marek <[email protected]> wrote: > Thank you for the follow-up. In mean-time I can confirm the > commit 77dff5d937b1 might be the source of the original reported issue. > > Unfortunately pinning version down to 16.12 only avoids the > MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery > after 12+ hours. > > FATAL: could not access status of transaction 24958976 DETAIL: Could not > read from file "pg_multixact/offsets/017C" at offset 221184: read too few > bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: > 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 > (keysh) > > We are going to try to pin 16.13 and try that before we can safely upgrade > of the primary/are confident we have working PITR recovery available should > we need it. > > Radim > > PS: Once I have some time I will try to setup a docker based harness to be > able to replicate original problem for later testing of the fix. > > On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote: > >> >> >> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: >> > >> > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 >> >> Thanks! >> >> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. >> If by chance you will have a backtrace of another deadlocking process - >> please post it. >> >> But it's not strictly necessary for analysis, I think we can figure out >> what >> happened from the backtrace you already posted. >> >> >> Best regards, Andrey Borodin. >> > ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-22 16:51 Ayush Tiwari <[email protected]> parent: Radim Marek <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Ayush Tiwari @ 2026-05-22 16:51 UTC (permalink / raw) To: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Hi, On Thu, 21 May 2026 at 14:36, Radim Marek <[email protected]> wrote: > Altough the culprit is known, I've got more data as requested. > > #0 0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at > ./build/src/backend/port/pg_sema.c:327 > #4 0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, > mode=mode@entry=LW_EXCLUSIVE) at > ./build/../src/backend/storage/lmgr/lwlock.c:1314 > #5 0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 > <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at > ./build/../src/backend/access/transam/slru.c:1174 > #6 0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, > nmembers=2, members=members@entry=0x7f20de6831ec) at > ./build/../src/backend/access/transam/multixact.c:944 > #7 0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at > ./build/../src/backend/access/transam/multixact.c:3464 > #8 0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, > record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at > ./build/../src/backend/access/transam/xlogrecovery.c:1951 > #9 PerformWalRecovery () at > ./build/../src/backend/access/transam/xlogrecovery.c:1782 > #10 0x000055a717740def in StartupXLOG () at > ./build/../src/backend/access/transam/xlog.c:5452 > #11 0x000055a71797c7e4 in StartupProcessMain () at > ./build/../src/backend/postmaster/startup.c:282 > #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) > at ./build/../src/backend/postmaster/auxprocess.c:141 > #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at > ./build/../src/backend/postmaster/postmaster.c:5381 > #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, > argv=argv@entry=0x55a73a8d0590) at > ./build/../src/backend/postmaster/postmaster.c:1463 > #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at > ./build/../src/backend/main/main.c:200 > > and WAL dump > > rmgr: Btree len (rec/tot): 64/ 64, tx: 336098, lsn: > 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel > 1663/16384/16432 blk 536 > rmgr: MultiXact len (rec/tot): 54/ 54, tx: 336098, lsn: > 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers > 2: 336089 (keysh) > 336098 (keysh) > rmgr: Heap len (rec/tot): 54/ 54, tx: 336098, lsn: > 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: > [IS_MULTI, LOCK_ONLY, > KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0 > rmgr: Heap len (rec/tot): 72/ 72, tx: 336096, lsn: > 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: > 52, old_infobits: [], > flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 > blk 22 > rmgr: Heap len (rec/tot): 71/ 71, tx: 336096, lsn: > 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: > 149, old_infobits: [], > flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 > blk 6 > rmgr: Heap len (rec/tot): 79/ 79, tx: 336096, lsn: > 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: > rel 1663/16384/16417 > blk 741 > rmgr: Heap len (rec/tot): 72/ 72, tx: 336097, lsn: > 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: > 243, old_infobits: [], > flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 > blk 26 > rmgr: Transaction len (rec/tot): 34/ 34, tx: 336096, lsn: > 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC > > Radim > Thanks for the additional backtrace and WAL dump. That makes the failure mode much clearer. The latest trace shows the startup process here: SimpleLruWriteAll(MultiXactOffsetCtl, false) RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...) multixact_redo() The WAL dump also shows the matching record: rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2 79871 is the last multixact on its offsets page, so replaying that record enters the next_pageno != pageno compatibility path added by 77dff5d937b. On REL_14 through REL_16, RecordNewMultiXact() already holds MultiXactOffsetSLRULock while executing that code. SimpleLruWriteAll() then tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same MultiXactOffsetSLRULock on those branches. That explains the standby startup process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding SLRU I/O activity. I think the right fix is to remove that SimpleLruWriteAll() call while keeping the missing-page initialization logic. The flush is only meant to make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers but have not reached disk. In this fallback path, I don't see a way for the tested next_pageno to be in that state: if RecordNewMultiXact() itself initializes the page, it writes it synchronously with SimpleLruWritePage() before setting last_initialized_offsets_page. I attached a small patch for REL_16_STABLE. The same self-deadlock pattern is also present on PG 14 and 15. PG 17 and 18 have the same compatibility call, but SLRU locking is banked there, and RecordNewMultiXact() does not appear to hold the relevant bank lock before calling SimpleLruWriteAll(), so I would not describe those branches as having this exact self-deadlock, but needs more analysis. Added both Andrey and Heikki in to-mail, since I'm not sure if this is more extreme than the multixact offset issue we had with 16.12, or it is at par with that. Regards, Ayush Attachments: [application/octet-stream] v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch (2.5K, 3-v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch) download | inline diff: From b33abeede0847edac3603b87a478a832be1784f8 Mon Sep 17 00:00:00 2001 From: Ayush Tiwari <[email protected]> Date: Thu, 21 May 2026 07:39:28 +0000 Subject: [PATCH REL_16_STABLE v1] Avoid self-deadlock on MultiXactOffsetSLRULock during WAL replay Commit 77dff5d937b added a compatibility check in RecordNewMultiXact() that can call SimpleLruWriteAll(MultiXactOffsetCtl, false) while already holding MultiXactOffsetSLRULock. In REL_16, SimpleLruWriteAll() tries to acquire the same SLRU control lock, so WAL replay can self-deadlock with the startup process waiting on LWLock:MultiXactOffsetSLRU. The flush is not needed for the page tested in this fallback path. If RecordNewMultiXact() initializes that offsets page, it writes it synchronously with SimpleLruWritePage() before updating last_initialized_offsets_page. Drop the unsafe flush and keep the existing missing-page initialization logic. Reported-by: Radim Marek <[email protected]> Reported-by: Marko Tiikkaja <[email protected]> Diagnosed-by: Andrey Borodin <[email protected]> Discussion: https://postgr.es/m/[email protected] --- src/backend/access/transam/multixact.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index f825579e888..5b6b48eb79c 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -934,16 +934,17 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, * seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, which should * happen at most once after starting WAL recovery. * - * As an extra safety measure, if we do resort to - * SimpleLruDoesPhysicalPageExist(), flush the SLRU buffers first so - * that it will return an accurate result. + * + * We cannot call SimpleLruWriteAll() to flush the SLRU buffers + * here, because that would self-deadlock on MultiXactOffsetSLRULock, + * which we already hold. Fortunately we do not need to: every + * page that this code path initializes is synchronously flushed via + * SimpleLruWritePage() below before this lock is released, so there + * are no relevant dirty pages. *---------- */ if (last_initialized_offsets_page == -1) - { - SimpleLruWriteAll(MultiXactOffsetCtl, false); init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno); - } else init_needed = (last_initialized_offsets_page == pageno); -- 2.43.0 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 08:02 Michael Paquier <[email protected]> parent: Ayush Tiwari <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Michael Paquier @ 2026-05-26 08:02 UTC (permalink / raw) To: Ayush Tiwari <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote: > I think the right fix is to remove that SimpleLruWriteAll() call while > keeping the missing-page initialization logic. The flush is only meant to > make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers > but have not reached disk. In this fallback path, I don't see a way for > the tested next_pageno to be in that state: if RecordNewMultiXact() itself > initializes the page, it writes it synchronously with SimpleLruWritePage() > before setting last_initialized_offsets_page. FWIW, I'm having a couple of customers complaining about that as well, as cross-version physical replication is a thing for minor upgrade flows. This bug is making suddenly recovery disruptive for some folks out there. :( > I attached a small patch for REL_16_STABLE. The same self-deadlock pattern > is also present on PG 14 and 15. PG 17 and > 18 have the same compatibility call, but SLRU locking is banked > there, and RecordNewMultiXact() does not appear to hold the relevant bank > lock before calling SimpleLruWriteAll(), so I would not describe those > branches as having this exact self-deadlock, but needs more analysis. So your root argument is that while the SimpleLruWriteAll() is defensive, it is not actually necessary because it means that last_initialized_offsets_page is -1 we have not yet replayed ZERO_OFF_PAGE and that we have no dirty page that could make SimpleLruDoesPhysicalPageExis() return an incorrect result, which would be bad. I am not sure to agree that this assumption is correct all the time, see for example the WAL message mentioned in the thread that has led to 77dff5d937b1: https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi I can see mentioned this WAL sequence, which is possible because there is no strict ordering in the creation of the mxacts: ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 Based on that, if we begin recovery after ZERO_PAGE:2048, we could finish with this kind of sequence: CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 Looking closer, last_initialized_offsets_page stays at -1. The page for 2048 was zeroed before the checkpoint by the earlier ZERO_PAGE:2048. CREATE_ID:2048 and CREATE_ID:2049 are created first. Then comes CREATE_ID:2047 which enters the last_initialized_offsets_page branch. If we don't have the WriteAll(), the page where the offsets of 2048 and 2049 are located gets zeroed while creating 2047, corrupting the existing state of 2048 and 2049. A different approach would be to release and re-acquire the MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think that it should be actually safe. Even if read-only backends evict dirty pages between the moment the lock is released and the moment it is re-acquired in SimpleLruWriteAll(), the pages would be would be written to disk due to the eviction, which is what we want for correctness. And only the startup process dirties offset pages during recovery, AFAIK. Thoughts? > Added both Andrey and Heikki in to-mail, since I'm not sure if this > is more extreme than the multixact offset issue we had with 16.12, or it > is at par with that. Indeed, let's wait for at least Heikki's input. Anyway, for any fixes, I don't think that it would be a good idea to skip v17 and v18, relying on the SLRU bank locks to not conflict to bypass the WriteAll() conflict. Let's keep all the branches across v14~v18 in sync. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 08:30 Ayush Tiwari <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Ayush Tiwari @ 2026-05-26 08:30 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Hi, On Tue, 26 May 2026 at 13:32, Michael Paquier <[email protected]> wrote: > On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote: > > I think the right fix is to remove that SimpleLruWriteAll() call while > > keeping the missing-page initialization logic. The flush is only meant > to > > make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU > buffers > > but have not reached disk. In this fallback path, I don't see a way for > > the tested next_pageno to be in that state: if RecordNewMultiXact() > itself > > initializes the page, it writes it synchronously with > SimpleLruWritePage() > > before setting last_initialized_offsets_page. > > FWIW, I'm having a couple of customers complaining about that as well, > as cross-version physical replication is a thing for minor upgrade > flows. This bug is making suddenly recovery disruptive for some folks > out there. :( > We had faced a lot of replicas in bad state due to multixact replay with 16.12 release, and had to revert back the minor versions for them until 16.13 came out which was a blessing. Given the number of CVEs current one fixes, reverting too is scary. > > I attached a small patch for REL_16_STABLE. The same self-deadlock > pattern > > is also present on PG 14 and 15. PG 17 and > > 18 have the same compatibility call, but SLRU locking is banked > > there, and RecordNewMultiXact() does not appear to hold the relevant bank > > lock before calling SimpleLruWriteAll(), so I would not describe those > > branches as having this exact self-deadlock, but needs more analysis. > > So your root argument is that while the SimpleLruWriteAll() is > defensive, it is not actually necessary because it means that > last_initialized_offsets_page is -1 we have not yet replayed > ZERO_OFF_PAGE and that we have no dirty page that could make > SimpleLruDoesPhysicalPageExis() return an incorrect result, which > would be bad. I am not sure to agree that this assumption is correct > all the time, see for example the WAL message mentioned in the thread > that has led to 77dff5d937b1: > > https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi Right, agreed. Thanks for pointing to that case. My v1 patch removes the self-deadlock, but the "no relevant dirty pages" assumption is too strong. The dirty page does not have to be one initialized by the current RecordNewMultiXact() call. It can already contain offsets replayed from later CREATE_ID records while last_initialized_offsets_page is still -1. In that state, relying directly on SimpleLruDoesPhysicalPageExist() can still produce a false negative because it only checks the physical file, not dirty SLRU buffers. So removing the flush can maybe reintroduce the kind of corruption that 77dff5d937b1 was trying to prevent. A different approach would be to release and re-acquire the > MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think > that it should be actually safe. Even if read-only backends evict > dirty pages between the moment the lock is released and the moment it > is re-acquired in SimpleLruWriteAll(), the pages would be would be > written to disk due to the eviction, which is what we want for > correctness. And only the startup process dirties offset pages during > recovery, AFAIK. Thoughts? > That sounds like the right direction to me. Releasing MultiXactOffsetSLRULock around SimpleLruWriteAll() preserves the flush-before-physical-check rule while avoiding the self-deadlock. I don't see a partial-state problem from the current record at that point, since the compatibility check happens before RecordNewMultiXact() has modified the current offsets page. And as you said, during recovery The startup process should be the only process dirtying offset pages; if a hot standby reader causes eviction while the lock is released, that should only help by writing the dirty page out. > Added both Andrey and Heikki in to-mail, since I'm not sure if this > > is more extreme than the multixact offset issue we had with 16.12, or it > > is at par with that. > > Indeed, let's wait for at least Heikki's input. > > Anyway, for any fixes, I don't think that it would be a good idea to > skip v17 and v18, relying on the SLRU bank locks to not conflict to > bypass the WriteAll() conflict. Let's keep all the branches across > v14~v18 in sync. > Agreed. Regards, Ayush ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 08:41 Andrey Borodin <[email protected]> parent: Ayush Tiwari <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Andrey Borodin @ 2026-05-26 08:41 UTC (permalink / raw) To: Ayush Tiwari <[email protected]>; +Cc: Michael Paquier <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> > On 26 May 2026, at 13:30, Ayush Tiwari <[email protected]> wrote: > > Releasing MultiXactOffsetSLRULock around SimpleLruWriteAll() preserves > the flush-before-physical-check rule while avoiding the self-deadlock. I think we don't need to release lock, we just need to acquire it later, as it is done in 17+ branches. FWIW I'm working on buildfarm module that will recovery regress WAL from REL_x_0 through replay by REL_x_STABLE. Best regards, Andrey Borodin. ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 09:27 Michael Paquier <[email protected]> parent: Andrey Borodin <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Michael Paquier @ 2026-05-26 09:27 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On Tue, May 26, 2026 at 01:41:03PM +0500, Andrey Borodin wrote: > I think we don't need to release lock, we just need to acquire it later, as it is done > in 17+ branches. Hmm, okay. I am not sure what you mean here, could you demonstrate your idea with a patch later? -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 09:33 Andrey Borodin <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Andrey Borodin @ 2026-05-26 09:33 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> > On 26 May 2026, at 14:27, Michael Paquier <[email protected]> wrote: > > Hmm, okay. I am not sure what you mean here, could you demonstrate > your idea with a patch later? Something like attached, not tested yet, working on an automated test. Best regards, Andrey Borodin. Attachments: [application/octet-stream] demo.diff (1.9K, 2-demo.diff) download | inline diff: diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index f825579e888..8899d5ac63d 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -888,8 +888,6 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, MultiXactOffset *next_offptr; MultiXactOffset next_offset; - LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); - /* position of this multixid in the offsets SLRU area */ pageno = MultiXactIdToOffsetPage(multi); entryno = MultiXactIdToOffsetEntry(multi); @@ -907,6 +905,9 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, * multixid was assigned. If we're replaying WAL that was generated by * such a version, the next page might not be initialized yet. Initialize * it now. + * + * This block runs before acquiring MultiXactOffsetSLRULock because + * SimpleLruWriteAll() needs to acquire the same lock internally. */ if (InRecovery && next_pageno != pageno) { @@ -951,6 +952,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, { elog(DEBUG1, "next offsets page is not initialized, initializing it now"); + LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); + /* Create and zero the page */ slotno = SimpleLruZeroPage(MultiXactOffsetCtl, next_pageno); @@ -958,6 +961,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, SimpleLruWritePage(MultiXactOffsetCtl, slotno); Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]); + LWLockRelease(MultiXactOffsetSLRULock); + /* * Remember that we initialized the page, so that we don't zero it * again at the XLOG_MULTIXACT_ZERO_OFF_PAGE record. @@ -967,6 +972,8 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, } } + LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); + /* * Set the starting offset of this multixid's members. * ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 12:28 Heikki Linnakangas <[email protected]> parent: Andrey Borodin <[email protected]> 0 siblings, 1 reply; 20+ messages in thread From: Heikki Linnakangas @ 2026-05-26 12:28 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; Michael Paquier <[email protected]>; +Cc: Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On 26/05/2026 12:33, Andrey Borodin wrote: >> On 26 May 2026, at 14:27, Michael Paquier <[email protected]> wrote: >> >> Hmm, okay. I am not sure what you mean here, could you demonstrate >> your idea with a patch later? > > Something like attached, not tested yet, working on an automated test. Yeah, that looks correct to me. It moves the locking on v16 to where it happens on v17 and v18. I don't see any reason to hold the lock in the earlier parts of RecordNewMultiXact() in v16. - Heikki ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 18:29 Andrey Borodin <[email protected]> parent: Heikki Linnakangas <[email protected]> 0 siblings, 2 replies; 20+ messages in thread From: Andrey Borodin @ 2026-05-26 18:29 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> > On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote: > > looks correct I tested that change as follows. Setted up REL_16_0 as primary, REL_16_STABLE as standby. Generate multixacts in a single session using savepoints: BEGIN; SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE; -- repeat 2500 times: SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a; COMMIT; Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU page boundary at multixact 2048 with some spare multis (we'll pickle the excess ones in jars when all is fixed, toying with 2048 wasted dev cycles for no reason). Test: 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page 0->1) 1. Take pg_basebackup 2. Run the workload again (2500 more, crossing page 1->2) 3. Start the standby I observe: Without the change startup deadlocks. With the change standby catches up, the DEBUG1 message "next offsets page is not initialized, initializing it now" confirms the compat block fires correctly. I packaged this test into a buildfarm module (TestReplayXversion) [0] that builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces the deadlock on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject regress WAL trace into it, not working so far. On a bright side - I managed to get PR number 42 in buildfarm client repo. Best regards, Andrey Borodin. [0] https://github.com/PGBuildFarm/client-code/pull/42 ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-27 00:30 Michael Paquier <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 0 replies; 20+ messages in thread From: Michael Paquier @ 2026-05-27 00:30 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On Tue, May 26, 2026 at 11:29:58PM +0500, Andrey Borodin wrote: > On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote: >> looks correct Neither do I see an issue in doing the first steps of RecordNewMultiXact() without holding the lock. The consistency that we get across all the stable branches after this patch makes the whole logic neater. > I observe: > Without the change startup deadlocks. > With the change standby catches up, the DEBUG1 message "next offsets page is not > initialized, initializing it now" confirms the compat block fires correctly. Cool, thanks for the patch and double-checking things, Andrey! I did not check the fix beyond a check-world (aka no cross-version replay done here), but looking closely through the code I don't immediately see why this would be wrong across the v14~v16 range. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-27 02:55 Nazneen Jafri <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 1 reply; 20+ messages in thread From: Nazneen Jafri @ 2026-05-27 02:55 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Tested Andrey's demo.diff on a fresh environment: - Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert) - ~2300 MultiXacts crossing the offsets page boundary - Without patch: startup deadlocks at RecordNewMultiXact(multi=2047) - With patch: standby replays all WAL and catches up Thanks, Nazneen On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <[email protected]> wrote: > > > > On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote: > > > > looks correct > > I tested that change as follows. > > Setted up REL_16_0 as primary, REL_16_STABLE as standby. > > Generate multixacts in a single session using savepoints: > > BEGIN; > SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE; > -- repeat 2500 times: > SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a; > COMMIT; > > Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU > page > boundary at multixact 2048 with some spare multis (we'll pickle the excess > ones in > jars when all is fixed, toying with 2048 wasted dev cycles for no reason). > > Test: > 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page > 0->1) > 1. Take pg_basebackup > 2. Run the workload again (2500 more, crossing page 1->2) > 3. Start the standby > > I observe: > Without the change startup deadlocks. > With the change standby catches up, the DEBUG1 message "next offsets page > is not > initialized, initializing it now" confirms the compat block fires > correctly. > > I packaged this test into a buildfarm module (TestReplayXversion) [0] that > builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces > the deadlock > on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject > regress WAL trace > into it, not working so far. On a bright side - I managed to get PR number > 42 in buildfarm > client repo. > > > Best regards, Andrey Borodin. > > [0] https://github.com/PGBuildFarm/client-code/pull/42 > > > > > > ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-27 09:06 Heikki Linnakangas <[email protected]> parent: Nazneen Jafri <[email protected]> 0 siblings, 2 replies; 20+ messages in thread From: Heikki Linnakangas @ 2026-05-27 09:06 UTC (permalink / raw) To: Nazneen Jafri <[email protected]>; Andrey Borodin <[email protected]>; +Cc: Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On 27/05/2026 05:55, Nazneen Jafri wrote: > Tested Andrey's demo.diff on a fresh environment: > > - Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert) > > - ~2300 MultiXacts crossing the offsets page boundary > > - Without patch: startup deadlocks at RecordNewMultiXact(multi=2047) > > - With patch: standby replays all WAL and catches up Thanks all. I have applied this to v14 - v16. - Heikki ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-27 12:08 Andrey Borodin <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 0 replies; 20+ messages in thread From: Andrey Borodin @ 2026-05-27 12:08 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Nazneen Jafri <[email protected]>; Michael Paquier <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> > On 27 May 2026, at 14:06, Heikki Linnakangas <[email protected]> wrote: > > I have applied this to v14 - v16. Thanks! I can confirm that all 5 branches are now passing new buildfarm test module. While 14-16 were failing it this morning. I'll try to get this test module to usable state and enable on my animal. Interestingly, "make installcheck" regress trace was not triggering WAL incompatibility, so I this module is not "make installcheck" + "special multixact workload" [0]. I'm not sure it is useful for finding other similar bugs... Best regards, Andrey Borodin. [0] https://github.com/PGBuildFarm/client-code/pull/42/changes#diff-588541281f9511e15c02bc6718535cf7cd28... ^ permalink raw reply [nested|flat] 20+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-28 01:12 Michael Paquier <[email protected]> parent: Heikki Linnakangas <[email protected]> 1 sibling, 0 replies; 20+ messages in thread From: Michael Paquier @ 2026-05-28 01:12 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; +Cc: Nazneen Jafri <[email protected]>; Andrey Borodin <[email protected]>; Ayush Tiwari <[email protected]>; Radim Marek <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On Wed, May 27, 2026 at 12:06:45PM +0300, Heikki Linnakangas wrote: > Thanks all. I have applied this to v14 - v16. Thanks for applying the fix. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 20+ messages in thread
end of thread, other threads:[~2026-05-28 01:12 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-05-20 21:16 BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 PG Bug reporting form <[email protected]> 2026-05-21 07:07 ` Andrey Borodin <[email protected]> 2026-05-21 07:12 ` Marko Tiikkaja <[email protected]> 2026-05-21 07:25 ` Andrey Borodin <[email protected]> 2026-05-21 07:45 ` Ayush Tiwari <[email protected]> 2026-05-21 08:34 ` Radim Marek <[email protected]> 2026-05-21 09:06 ` Radim Marek <[email protected]> 2026-05-22 16:51 ` Ayush Tiwari <[email protected]> 2026-05-26 08:02 ` Michael Paquier <[email protected]> 2026-05-26 08:30 ` Ayush Tiwari <[email protected]> 2026-05-26 08:41 ` Andrey Borodin <[email protected]> 2026-05-26 09:27 ` Michael Paquier <[email protected]> 2026-05-26 09:33 ` Andrey Borodin <[email protected]> 2026-05-26 12:28 ` Heikki Linnakangas <[email protected]> 2026-05-26 18:29 ` Andrey Borodin <[email protected]> 2026-05-27 00:30 ` Michael Paquier <[email protected]> 2026-05-27 02:55 ` Nazneen Jafri <[email protected]> 2026-05-27 09:06 ` Heikki Linnakangas <[email protected]> 2026-05-27 12:08 ` Andrey Borodin <[email protected]> 2026-05-28 01:12 ` Michael Paquier <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox