public inbox for [email protected]
help / color / mirror / Atom feedFrom: Nazneen Jafri <[email protected]>
To: Andrey Borodin <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Ayush Tiwari <[email protected]>
Cc: Radim Marek <[email protected]>
Cc: Marko Tiikkaja <[email protected]>
Cc: PostgreSQL mailing lists <[email protected]>
Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
Date: Tue, 26 May 2026 19:55:14 -0700
Message-ID: <CA+m5N8s5QGqqxu_re+YFv9PRNrisM7D-Cqbhfj=m8FNZLrovhg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAL9smLBMxKBCmsA9UGcmf93bT2_MsZ+POH-oHREuwKdmMU7jfQ@mail.gmail.com>
<[email protected]>
<CAJgoLkJfFgL-V+pYB7=R81AbURTE6sMhzVHDQDhVGnfXRSJ9Wg@mail.gmail.com>
<CAJgoLkKCu0wCwPQZSo5no=XATU-4LMK4QfKBwV928o2uKcxe=g@mail.gmail.com>
<CAJTYsWU6tdEvVh4YKLxz7+amZ7+Wb7_s-FBjsMMeLNj1fKeSNg@mail.gmail.com>
<[email protected]>
<CAJTYsWWXvbBJe+WYJZcnoSTyVz9vk5ro3x2qAq_uvXvK2KwaMQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
Tested Andrey's demo.diff on a fresh environment:
- Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert)
- ~2300 MultiXacts crossing the offsets page boundary
- Without patch: startup deadlocks at RecordNewMultiXact(multi=2047)
- With patch: standby replays all WAL and catches up
Thanks,
Nazneen
On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <[email protected]> wrote:
>
>
> > On 26 May 2026, at 17:28, Heikki Linnakangas <[email protected]> wrote:
> >
> > looks correct
>
> I tested that change as follows.
>
> Setted up REL_16_0 as primary, REL_16_STABLE as standby.
>
> Generate multixacts in a single session using savepoints:
>
> BEGIN;
> SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
> -- repeat 2500 times:
> SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
> COMMIT;
>
> Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU
> page
> boundary at multixact 2048 with some spare multis (we'll pickle the excess
> ones in
> jars when all is fixed, toying with 2048 wasted dev cycles for no reason).
>
> Test:
> 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page
> 0->1)
> 1. Take pg_basebackup
> 2. Run the workload again (2500 more, crossing page 1->2)
> 3. Start the standby
>
> I observe:
> Without the change startup deadlocks.
> With the change standby catches up, the DEBUG1 message "next offsets page
> is not
> initialized, initializing it now" confirms the compat block fires
> correctly.
>
> I packaged this test into a buildfarm module (TestReplayXversion) [0] that
> builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces
> the deadlock
> on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject
> regress WAL trace
> into it, not working so far. On a bright side - I managed to get PR number
> 42 in buildfarm
> client repo.
>
>
> Best regards, Andrey Borodin.
>
> [0] https://github.com/PGBuildFarm/client-code/pull/42
>
>
>
>
>
>
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
In-Reply-To: <CA+m5N8s5QGqqxu_re+YFv9PRNrisM7D-Cqbhfj=m8FNZLrovhg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox