public inbox for [email protected]help / color / mirror / Atom feed
BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 9+ messages / 6 participants [nested] [flat]
* BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-20 21:16 PG Bug reporting form <[email protected]> 0 siblings, 1 reply; 9+ messages in thread From: PG Bug reporting form @ 2026-05-20 21:16 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] The following bug has been logged on the website: Bug reference: 19490 Logged by: Radim Marek Email address: [email protected] PostgreSQL version: 16.14 Operating system: Linux - Ubuntu 22.04 Description: Hello, due to a mistake we have run a higher minor version of 16.x against the non-upgraded primary. This led to repeated issues on WAL processing. Description: A streaming replication standby running 16.14 stops advancing replay while WAL keeps arriving from a 16.8 primary. The startup process is parked in futex_wait_queue with wait_event = LWLock:MultiXactOffsetSLRU and no longer makes progress. pg_stat_slru shows zero MultiXact activity over the same window, so it appears to stop on the lock itself rather than inside any SLRU read/write path. Downgrading the standby binary to 16.12 (same data directory) resolved the symptom under the same workload. Configuration: Primary running 16.8-1.pgdg22.04+1, we observed both loaded and "relatively" idle (below 1000 QPS) Replica: 16.14-1.pgdg22.04+1, physical streaming, async, single replica on 16.14 due to misconfiguration, no cascading. Other replicas not affected (running 16.8). hot_standby_feedback enabled, logical replication from primary. default WAL segment size. Default SLRU buffer sizes. Observed symptoms on the standby 1. pg_stat_replication on primary, just the affected node client_addr state sent_lag write_lag flush_lag replay_lag_bytes replay_lag 10.x.x.x streaming 0 0 0 8766784344 02:42:50 2. Receive/write/flush all at the primary's current LSN; only replay is far behind and growing. 3. Startup process wait event on standby (sampled repeatedly, always identical)pid wait_event_type wait_event state 19095 LWLock MultiXactOffsetSLRU (null) 4. Kernel stack of the startup process cat /proc/19095/stack [<0>] futex_wait_queue+0x67/0xa0 [<0>] __futex_wait+0x155/0x1d0 [<0>] futex_wait+0x74/0x120 [<0>] do_futex+0x16d/0x230 [<0>] __x64_sys_futex+0x95/0x200 [<0>] x64_sys_call+0x117b/0x2480 [<0>] do_syscall_64+0x81/0x170 [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80 cat /proc/19095/wchan futex_wait_queue 5. pg_stat_slru on the standby, after pg_stat_reset_slru(NULL) and a 60-second wait under live WAL streaming name blks_zeroed blks_hit blks_read blks_written MultiXactMember 0 0 0 0 MultiXactOffset 0 0 0 0 6. There was no MultiXact SLRU activity while the startup process is reportedly waiting on the MultiXact offset SLRU lock. 7. Replay LSN frozen, receive LSN advancing. Sampled 60 sec apart. recv replay lag_bytes 1476A/D1DA158 14767/EE01DB78 9111848416 1476A/EB565D0 14767/EE01DB78 9138571864 8. No replay progress; ~9 GB of WAL buffered locally that is never applied. 6. Other backends on the standby: only a diagnostic psql client. No hot-standby readers. 7. MultiXact age on the primary is small (~360k on most DBs, ~239k on the main DB). No MultiXact storm. Workarounds - Restarting the standby cleared the block but once it caught up it repeated again- Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against the same data directory restored normal replay. After 60s under the same workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact. I understand that running 6 minor versions behind is not particulary good setup, but given this being supported direction this might be worth at least in 16.13/16.14 release notes. --- Hope this helps, Radim ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:07 Andrey Borodin <[email protected]> parent: PG Bug reporting form <[email protected]> 0 siblings, 1 reply; 9+ messages in thread From: Andrey Borodin @ 2026-05-21 07:07 UTC (permalink / raw) To: [email protected]; PostgreSQL mailing lists <[email protected]> Thanks for the report! Oh, this seems to be from the "gift that keeps on giving" department. Related to [0] > On 20 May 2026, at 14:16, PG Bug reporting form <[email protected]> wrote: > > Downgrading the standby binary to 16.12 (16.12-1.pgdg22.04+1) against > the same data directory restored normal replay. After 60s under the same > workload pg_stat_slru shows only 2 hits / 0 reads on MultiXact. Are you sure that it's not 16.11 that is resolving the problem? Can you get a backtrace of hanging startup process with debug symbols? Or obtain last replayed LSN and do a WAL dump in the area of deadlocked startup. I don't see how this might be a result of [1] and [2], so, perhaps, it's some more peculiarities from [3]. But 16.12 has [3]... Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/CACV2tSw3VYS7d27ftO_cs%2BaF3M54%2BJwWBbqSGLcKoG9cvyb6EA%4... [1] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=77dff5d937b192b85c55bc... [2] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=23064542f8bdcbc4b6a513... [3] https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_16_STABLE&id=6351669130782ed01eed3a... ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:12 Marko Tiikkaja <[email protected]> parent: Andrey Borodin <[email protected]> 0 siblings, 1 reply; 9+ messages in thread From: Marko Tiikkaja @ 2026-05-21 07:12 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]> Hi Andrey, On Thu, May 21, 2026 at 10:07 AM Andrey Borodin <[email protected]> wrote: > Are you sure that it's not 16.11 that is resolving the problem? > Can you get a backtrace of hanging startup process with debug symbols? We had this problem just morning: #0 __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=265, expected=0, futex_word=0x785c290170b8) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x785c290170b8) at ./nptl/futex-internal.c:87 #2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x785c290170b8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139 #3 0x0000786048c9cbdf in do_futex_wait (sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111 #4 0x0000786048c9cc78 in __new_sem_wait_slow64 (sem=sem@entry=0x785c290170b8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183 #5 0x0000786048c9ccf1 in __new_sem_wait (sem=sem@entry=0x785c290170b8) at ./nptl/sem_wait.c:42 #6 0x0000654c8b150b86 in PGSemaphoreLock (sema=0x785c290170b8) at port/pg_sema.c:327 #7 LWLockAcquire (lock=0x785c29017a80, mode=LW_EXCLUSIVE) at storage/lmgr/./build/../src/backend/storage/lmgr/lwlock.c:1314 #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 <MultiXactOffsetCtlData.lto_priv.0>, allow_redirtied=<optimized out>) at access/transam/./build/../src/backend/access/transam/slru.c:1174 #9 0x0000654c8ae22719 in RecordNewMultiXact (multi=1201227775, offset=2755202388, nmembers=2, members=0x7860465ec28c) at access/transam/./build/../src/backend/access/transam/multixact.c:944 #10 0x0000654c8ae255c6 in multixact_redo (record=0x654cb292c620) at access/transam/./build/../src/backend/access/transam/multixact.c:3464 #11 0x0000654c8ae4ea2d in ApplyWalRecord (replayTLI=<synthetic pointer>, record=0x7860465ec250, xlogreader=<optimized out>) at access/transam/./build/../src/include/access/xlog_internal.h:379 #12 PerformWalRecovery () at access/transam/./build/../src/backend/access/transam/xlogrecovery.c:1782 #13 0x0000654c8ae3bcb7 in StartupXLOG () at access/transam/./build/../src/backend/access/transam/xlog.c:5452 #14 0x0000654c8b0cbe7b in StartupProcessMain () at postmaster/./build/../src/backend/postmaster/startup.c:282 We downgraded to 16.13 and the problem went away. .m ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:25 Andrey Borodin <[email protected]> parent: Marko Tiikkaja <[email protected]> 0 siblings, 2 replies; 9+ messages in thread From: Andrey Borodin @ 2026-05-21 07:25 UTC (permalink / raw) To: Marko Tiikkaja <[email protected]>; +Cc: [email protected]; PostgreSQL mailing lists <[email protected]> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 Thanks! This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. If by chance you will have a backtrace of another deadlocking process - please post it. But it's not strictly necessary for analysis, I think we can figure out what happened from the backtrace you already posted. Best regards, Andrey Borodin. ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 07:45 Ayush Tiwari <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 0 replies; 9+ messages in thread From: Ayush Tiwari @ 2026-05-21 07:45 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; [email protected]; PostgreSQL mailing lists <[email protected]> Hi, On Thu, 21 May 2026 at 12:55, Andrey Borodin <[email protected]> wrote: > > > > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 > > Thanks! > > This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. > If by chance you will have a backtrace of another deadlocking process - > please post it. > > But it's not strictly necessary for analysis, I think we can figure out > what > happened from the backtrace you already posted. > I had a look at the code that Marko's backtrace pointed at and I believe this is a straightforward self-deadlock introduced by 77dff5d937b. In RecordNewMultiXact() on REL_16_STABLE: LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); ... if (InRecovery && next_pageno != pageno) { ... if (last_initialized_offsets_page == -1) { SimpleLruWriteAll(MultiXactOffsetCtl, false); /* <-- here */ init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno); } else init_needed = (last_initialized_offsets_page == pageno); ... } The outer LWLockAcquire takes MultiXactOffsetSLRULock EXCLUSIVE. SimpleLruWriteAll() in REL_16_STABLE then does LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE); and for the MultiXactOffsetCtl SLRU, shared->ControlLock is MultiXactOffsetSLRULock (set up by SimpleLruInit(... MultiXactOffsetSLRULock ...)). So it tries to take the very lock the same backend already holds. LWLockAcquire does not detect that and parks the process on LWLock:MultiXactOffsetSLRU forever. That matches every datum in the report: - wait_event = LWLock:MultiXactOffsetSLRU. - pg_stat_slru shows zero MultiXact activity, because the SimpleLruWriteAll loop never gets past LWLockAcquire to actually write a page. - Restart unwedges things briefly. - The deadlock only triggers when last_initialized_offsets_page is still -1, i.e. before any XLOG_MULTIXACT_ZERO_OFF_PAGE record has been replayed in this recovery session, which is at most once per startup and consistent with the "recurs after catch-up" behaviour. The "safety flush" the comment justifies is it needed? Every offsets page that this code path initializes is synchronously written via SimpleLruWritePage() a few lines below the SimpleLruZeroPage(), with an Assert that the page is clean afterwards. So at the moment we call SimpleLruDoesPhysicalPageExist(), there shouldn't be a relevant dirty offsets page in the SLRU buffer cache that would lead to a false negative. Dropping the SimpleLruWriteAll() call therefore removes the self-deadlock without changing correctness. Maybe I'm missing something here. Thoughts? Regards, Ayush ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 08:34 Radim Marek <[email protected]> parent: Andrey Borodin <[email protected]> 1 sibling, 1 reply; 9+ messages in thread From: Radim Marek @ 2026-05-21 08:34 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Thank you for the follow-up. In mean-time I can confirm the commit 77dff5d937b1 might be the source of the original reported issue. Unfortunately pinning version down to 16.12 only avoids the MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery after 12+ hours. FATAL: could not access status of transaction 24958976 DETAIL: Could not read from file "pg_multixact/offsets/017C" at offset 221184: read too few bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 (keysh) We are going to try to pin 16.13 and try that before we can safely upgrade of the primary/are confident we have working PITR recovery available should we need it. Radim PS: Once I have some time I will try to setup a docker based harness to be able to replicate original problem for later testing of the fix. On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote: > > > > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: > > > > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 > > Thanks! > > This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. > If by chance you will have a backtrace of another deadlocking process - > please post it. > > But it's not strictly necessary for analysis, I think we can figure out > what > happened from the backtrace you already posted. > > > Best regards, Andrey Borodin. > ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-21 09:06 Radim Marek <[email protected]> parent: Radim Marek <[email protected]> 0 siblings, 1 reply; 9+ messages in thread From: Radim Marek @ 2026-05-21 09:06 UTC (permalink / raw) To: Andrey Borodin <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Altough the culprit is known, I've got more data as requested. #0 0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at ./build/src/backend/port/pg_sema.c:327 #4 0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, mode=mode@entry=LW_EXCLUSIVE) at ./build/../src/backend/storage/lmgr/lwlock.c:1314 #5 0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at ./build/../src/backend/access/transam/slru.c:1174 #6 0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, nmembers=2, members=members@entry=0x7f20de6831ec) at ./build/../src/backend/access/transam/multixact.c:944 #7 0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464 #8 0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at ./build/../src/backend/access/transam/xlogrecovery.c:1951 #9 PerformWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:1782 #10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:5452 #11 0x000055a71797c7e4 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:282 #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) at ./build/../src/backend/postmaster/auxprocess.c:141 #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5381 #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x55a73a8d0590) at ./build/../src/backend/postmaster/postmaster.c:1463 #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at ./build/../src/backend/main/main.c:200 and WAL dump rmgr: Btree len (rec/tot): 64/ 64, tx: 336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel 1663/16384/16432 blk 536 rmgr: MultiXact len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089 (keysh) 336098 (keysh) rmgr: Heap len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_MULTI, LOCK_ONLY, KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0 rmgr: Heap len (rec/tot): 72/ 72, tx: 336096, lsn: 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk 22 rmgr: Heap len (rec/tot): 71/ 71, tx: 336096, lsn: 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits: [], flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk 6 rmgr: Heap len (rec/tot): 79/ 79, tx: 336096, lsn: 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/16417 blk 741 rmgr: Heap len (rec/tot): 72/ 72, tx: 336097, lsn: 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk 26 rmgr: Transaction len (rec/tot): 34/ 34, tx: 336096, lsn: 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC Radim On Thu, 21 May 2026 at 10:34, Radim Marek <[email protected]> wrote: > Thank you for the follow-up. In mean-time I can confirm the > commit 77dff5d937b1 might be the source of the original reported issue. > > Unfortunately pinning version down to 16.12 only avoids the > MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery > after 12+ hours. > > FATAL: could not access status of transaction 24958976 DETAIL: Could not > read from file "pg_multixact/offsets/017C" at offset 221184: read too few > bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: > 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 > (keysh) > > We are going to try to pin 16.13 and try that before we can safely upgrade > of the primary/are confident we have working PITR recovery available should > we need it. > > Radim > > PS: Once I have some time I will try to setup a docker based harness to be > able to replicate original problem for later testing of the fix. > > On Thu, 21 May 2026 at 09:25, Andrey Borodin <[email protected]> wrote: > >> >> >> > On 21 May 2026, at 00:12, Marko Tiikkaja <[email protected]> wrote: >> > >> > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 >> >> Thanks! >> >> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. >> If by chance you will have a backtrace of another deadlocking process - >> please post it. >> >> But it's not strictly necessary for analysis, I think we can figure out >> what >> happened from the backtrace you already posted. >> >> >> Best regards, Andrey Borodin. >> > ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-22 16:51 Ayush Tiwari <[email protected]> parent: Radim Marek <[email protected]> 0 siblings, 1 reply; 9+ messages in thread From: Ayush Tiwari @ 2026-05-22 16:51 UTC (permalink / raw) To: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> Hi, On Thu, 21 May 2026 at 14:36, Radim Marek <[email protected]> wrote: > Altough the culprit is known, I've got more data as requested. > > #0 0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #3 0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at > ./build/src/backend/port/pg_sema.c:327 > #4 0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, > mode=mode@entry=LW_EXCLUSIVE) at > ./build/../src/backend/storage/lmgr/lwlock.c:1314 > #5 0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 > <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at > ./build/../src/backend/access/transam/slru.c:1174 > #6 0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, > nmembers=2, members=members@entry=0x7f20de6831ec) at > ./build/../src/backend/access/transam/multixact.c:944 > #7 0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at > ./build/../src/backend/access/transam/multixact.c:3464 > #8 0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>, > record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at > ./build/../src/backend/access/transam/xlogrecovery.c:1951 > #9 PerformWalRecovery () at > ./build/../src/backend/access/transam/xlogrecovery.c:1782 > #10 0x000055a717740def in StartupXLOG () at > ./build/../src/backend/access/transam/xlog.c:5452 > #11 0x000055a71797c7e4 in StartupProcessMain () at > ./build/../src/backend/postmaster/startup.c:282 > #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) > at ./build/../src/backend/postmaster/auxprocess.c:141 > #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at > ./build/../src/backend/postmaster/postmaster.c:5381 > #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, > argv=argv@entry=0x55a73a8d0590) at > ./build/../src/backend/postmaster/postmaster.c:1463 > #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at > ./build/../src/backend/main/main.c:200 > > and WAL dump > > rmgr: Btree len (rec/tot): 64/ 64, tx: 336098, lsn: > 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel > 1663/16384/16432 blk 536 > rmgr: MultiXact len (rec/tot): 54/ 54, tx: 336098, lsn: > 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers > 2: 336089 (keysh) > 336098 (keysh) > rmgr: Heap len (rec/tot): 54/ 54, tx: 336098, lsn: > 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: > [IS_MULTI, LOCK_ONLY, > KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0 > rmgr: Heap len (rec/tot): 72/ 72, tx: 336096, lsn: > 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: > 52, old_infobits: [], > flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 > blk 22 > rmgr: Heap len (rec/tot): 71/ 71, tx: 336096, lsn: > 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: > 149, old_infobits: [], > flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 > blk 6 > rmgr: Heap len (rec/tot): 79/ 79, tx: 336096, lsn: > 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: > rel 1663/16384/16417 > blk 741 > rmgr: Heap len (rec/tot): 72/ 72, tx: 336097, lsn: > 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: > 243, old_infobits: [], > flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 > blk 26 > rmgr: Transaction len (rec/tot): 34/ 34, tx: 336096, lsn: > 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC > > Radim > Thanks for the additional backtrace and WAL dump. That makes the failure mode much clearer. The latest trace shows the startup process here: SimpleLruWriteAll(MultiXactOffsetCtl, false) RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...) multixact_redo() The WAL dump also shows the matching record: rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2 79871 is the last multixact on its offsets page, so replaying that record enters the next_pageno != pageno compatibility path added by 77dff5d937b. On REL_14 through REL_16, RecordNewMultiXact() already holds MultiXactOffsetSLRULock while executing that code. SimpleLruWriteAll() then tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same MultiXactOffsetSLRULock on those branches. That explains the standby startup process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding SLRU I/O activity. I think the right fix is to remove that SimpleLruWriteAll() call while keeping the missing-page initialization logic. The flush is only meant to make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers but have not reached disk. In this fallback path, I don't see a way for the tested next_pageno to be in that state: if RecordNewMultiXact() itself initializes the page, it writes it synchronously with SimpleLruWritePage() before setting last_initialized_offsets_page. I attached a small patch for REL_16_STABLE. The same self-deadlock pattern is also present on PG 14 and 15. PG 17 and 18 have the same compatibility call, but SLRU locking is banked there, and RecordNewMultiXact() does not appear to hold the relevant bank lock before calling SimpleLruWriteAll(), so I would not describe those branches as having this exact self-deadlock, but needs more analysis. Added both Andrey and Heikki in to-mail, since I'm not sure if this is more extreme than the multixact offset issue we had with 16.12, or it is at par with that. Regards, Ayush Attachments: [application/octet-stream] v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch (2.5K, 3-v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch) download | inline diff: From b33abeede0847edac3603b87a478a832be1784f8 Mon Sep 17 00:00:00 2001 From: Ayush Tiwari <[email protected]> Date: Thu, 21 May 2026 07:39:28 +0000 Subject: [PATCH REL_16_STABLE v1] Avoid self-deadlock on MultiXactOffsetSLRULock during WAL replay Commit 77dff5d937b added a compatibility check in RecordNewMultiXact() that can call SimpleLruWriteAll(MultiXactOffsetCtl, false) while already holding MultiXactOffsetSLRULock. In REL_16, SimpleLruWriteAll() tries to acquire the same SLRU control lock, so WAL replay can self-deadlock with the startup process waiting on LWLock:MultiXactOffsetSLRU. The flush is not needed for the page tested in this fallback path. If RecordNewMultiXact() initializes that offsets page, it writes it synchronously with SimpleLruWritePage() before updating last_initialized_offsets_page. Drop the unsafe flush and keep the existing missing-page initialization logic. Reported-by: Radim Marek <[email protected]> Reported-by: Marko Tiikkaja <[email protected]> Diagnosed-by: Andrey Borodin <[email protected]> Discussion: https://postgr.es/m/[email protected] --- src/backend/access/transam/multixact.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index f825579e888..5b6b48eb79c 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -934,16 +934,17 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, * seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, which should * happen at most once after starting WAL recovery. * - * As an extra safety measure, if we do resort to - * SimpleLruDoesPhysicalPageExist(), flush the SLRU buffers first so - * that it will return an accurate result. + * + * We cannot call SimpleLruWriteAll() to flush the SLRU buffers + * here, because that would self-deadlock on MultiXactOffsetSLRULock, + * which we already hold. Fortunately we do not need to: every + * page that this code path initializes is synchronously flushed via + * SimpleLruWritePage() below before this lock is released, so there + * are no relevant dirty pages. *---------- */ if (last_initialized_offsets_page == -1) - { - SimpleLruWriteAll(MultiXactOffsetCtl, false); init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno); - } else init_needed = (last_initialized_offsets_page == pageno); -- 2.43.0 ^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 @ 2026-05-26 08:02 Michael Paquier <[email protected]> parent: Ayush Tiwari <[email protected]> 0 siblings, 0 replies; 9+ messages in thread From: Michael Paquier @ 2026-05-26 08:02 UTC (permalink / raw) To: Ayush Tiwari <[email protected]>; +Cc: Radim Marek <[email protected]>; Andrey Borodin <[email protected]>; Heikki Linnakangas <[email protected]>; Marko Tiikkaja <[email protected]>; PostgreSQL mailing lists <[email protected]> On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote: > I think the right fix is to remove that SimpleLruWriteAll() call while > keeping the missing-page initialization logic. The flush is only meant to > make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers > but have not reached disk. In this fallback path, I don't see a way for > the tested next_pageno to be in that state: if RecordNewMultiXact() itself > initializes the page, it writes it synchronously with SimpleLruWritePage() > before setting last_initialized_offsets_page. FWIW, I'm having a couple of customers complaining about that as well, as cross-version physical replication is a thing for minor upgrade flows. This bug is making suddenly recovery disruptive for some folks out there. :( > I attached a small patch for REL_16_STABLE. The same self-deadlock pattern > is also present on PG 14 and 15. PG 17 and > 18 have the same compatibility call, but SLRU locking is banked > there, and RecordNewMultiXact() does not appear to hold the relevant bank > lock before calling SimpleLruWriteAll(), so I would not describe those > branches as having this exact self-deadlock, but needs more analysis. So your root argument is that while the SimpleLruWriteAll() is defensive, it is not actually necessary because it means that last_initialized_offsets_page is -1 we have not yet replayed ZERO_OFF_PAGE and that we have no dirty page that could make SimpleLruDoesPhysicalPageExis() return an incorrect result, which would be bad. I am not sure to agree that this assumption is correct all the time, see for example the WAL message mentioned in the thread that has led to 77dff5d937b1: https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi I can see mentioned this WAL sequence, which is possible because there is no strict ordering in the creation of the mxacts: ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 Based on that, if we begin recovery after ZERO_PAGE:2048, we could finish with this kind of sequence: CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 Looking closer, last_initialized_offsets_page stays at -1. The page for 2048 was zeroed before the checkpoint by the earlier ZERO_PAGE:2048. CREATE_ID:2048 and CREATE_ID:2049 are created first. Then comes CREATE_ID:2047 which enters the last_initialized_offsets_page branch. If we don't have the WriteAll(), the page where the offsets of 2048 and 2049 are located gets zeroed while creating 2047, corrupting the existing state of 2048 and 2049. A different approach would be to release and re-acquire the MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think that it should be actually safe. Even if read-only backends evict dirty pages between the moment the lock is released and the moment it is re-acquired in SimpleLruWriteAll(), the pages would be would be written to disk due to the eviction, which is what we want for correctness. And only the startup process dirties offset pages during recovery, AFAIK. Thoughts? > Added both Andrey and Heikki in to-mail, since I'm not sure if this > is more extreme than the multixact offset issue we had with 16.12, or it > is at par with that. Indeed, let's wait for at least Heikki's input. Anyway, for any fixes, I don't think that it would be a good idea to skip v17 and v18, relying on the SLRU bank locks to not conflict to bypass the WriteAll() conflict. Let's keep all the branches across v14~v18 in sync. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 9+ messages in thread
end of thread, other threads:[~2026-05-26 08:02 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-05-20 21:16 BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 PG Bug reporting form <[email protected]> 2026-05-21 07:07 ` Andrey Borodin <[email protected]> 2026-05-21 07:12 ` Marko Tiikkaja <[email protected]> 2026-05-21 07:25 ` Andrey Borodin <[email protected]> 2026-05-21 07:45 ` Ayush Tiwari <[email protected]> 2026-05-21 08:34 ` Radim Marek <[email protected]> 2026-05-21 09:06 ` Radim Marek <[email protected]> 2026-05-22 16:51 ` Ayush Tiwari <[email protected]> 2026-05-26 08:02 ` Michael Paquier <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox