Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRnBa-002heJ-0t for pgsql-bugs@arkaria.postgresql.org; Tue, 26 May 2026 08:30:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wRnBX-003yBz-0c for pgsql-bugs@arkaria.postgresql.org; Tue, 26 May 2026 08:30:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRnBW-003yBn-2v for pgsql-bugs@lists.postgresql.org; Tue, 26 May 2026 08:30:47 +0000 Received: from mail-yw1-x1133.google.com ([2607:f8b0:4864:20::1133]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wRnBU-00000001UMe-2Ds8 for pgsql-bugs@lists.postgresql.org; Tue, 26 May 2026 08:30:47 +0000 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-7bd4c61765dso98337177b3.3 for ; Tue, 26 May 2026 01:30:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779784242; cv=none; d=google.com; s=arc-20240605; b=S0eG59dU+eK+0aOKfQVIJ7CrD7ESfD7JbdzIy22gFAKGZvugHEKCTmMxEJfso/hN4x ut30hUUlq5GsJdOnteNNT0E7cSExtJ6FcIMS0pUzI+SmKssrztqTlVSgZPN+2ceeyrcR mBJiQ/QodTGxWtUFPX3AjQl+2K5BoYREcEqM8xrZT6+9iMenvNZuJEmOzfRW459TZl0P xe0fwLSJlThUbAhw26xoJuF7pQ11JHX7vqfbRzN0POULQRMv7bEhGd2ybBlS6ja9wTE3 y1u9AoClgv4bvDtOsYJxr7OwpDF1EGFrc3+/xebsXeAxjY+D1xpYD1/ldDs6h9nTe5Ij 2nsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=pkihQRMrBF7r+SYGN9/8iVDVM0CB6bcJxscyp/6F9G0=; fh=bEDcjitOvLul0bd9p/lmIZvmgfPCCwXnxB9jjw9m62o=; b=dgyTyx2Xt7eN/1kK+9LB2e0RZNxFJbTKBsVtErXFVT7Ye2liFZfTTLgsmInahjgaAf hl43/98dX73EzMrCc6sEQTuIWF7L2rFQwPHx787QvuhXsCCTw18fttbil3gSEpxayDeP xu9KD1LJ38y8DXLMy6pqquv+mTOm83zfIVAGTzN6r0fvZx8EjT2yLp9zYqW0BTeZxJPb KwoSQlluw9EtcOwRs/6XqZo1xYjcSEt9kdgIbT7oGFWXDWxAM4emchhhV+xssLiz43LJ PGQYsLG8cLutYJRnoPrSExsTImG49iKMDBJl9KQVTK5I4x1/g3o7peZMPMeWsSAOAvrr RcRw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779784242; x=1780389042; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=pkihQRMrBF7r+SYGN9/8iVDVM0CB6bcJxscyp/6F9G0=; b=US1A8SWQmA5R1i4qk/5KQIhRdl0c9Yd2mipHVG3Ma6HDxo9WeDvaPyBuvJcpD59kt/ T5iaRMQLoL+y1a8clffKDWWtV8rzAGV3YElmYlM5aWeqHfGkCEXmzVLU9aolsnYND2ml Ltv+5Se7Kk04NUSzCWdiVVgSILUkPl4DrToxBQgWrFtS8haDu/eoe1WlCgyiN1kj8X59 97HRknwn2XpmljtPrRGxV7e+t7I5WC0Z1eL2ejbJ+psWu9F3MgMKrMDyLIunb/1wnD6X kEFkFRfHmp+GmT/mr3SRUJAX5c/ZDzMmE5cy5jEMrhJXNZXlpujHP//U1gUb6eQV6tNJ dRdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779784242; x=1780389042; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pkihQRMrBF7r+SYGN9/8iVDVM0CB6bcJxscyp/6F9G0=; b=lKtMQUAH4Pt/D0MV5LIBVllNSm+dj9bCbnP1cOBdL6Ykq9BxwuRw7wI9oXPJd7QAvO 9EVH3UY6rSBGom+DImA9hWeeXUFfKoBKWkVD1E1zOA6c1/N51lVusN03pHRCkGXHBKU5 KLZvyWGtRai3BGDG5zIYZ0kJi5ECgy3Y2og6Grv65qHn1zgGyxQvcUp0lxUMZVmWholC cC3L+0yhRV39Fbbj/LduUq13geyWuTqJA/3veCFCi572TFQ1jZ8Tox5H8BEuTq3/qiqp aUdeFn+wBTUBANN2/4GNPeOJuNxUOcHCe2M987DzyaaCPApEb2QPLnKENm0nSfOAmBKA Fy0A== X-Forwarded-Encrypted: i=1; AFNElJ9W9xZkFa8Oetun9fqcRK55Xm4T7IxjOzrL8642N+PdWyWYt63MU7rkov7eTt9EHTJAX/Ctsn2HV62q@lists.postgresql.org X-Gm-Message-State: AOJu0YxB3t5b2nnVdcVfl7xU2QQlUHOTN+Tb7p1R+NCFMtBOcRPquguN gFs3PBzUiSVVhb9U10cJmoiDeQspQMqBRFKs+OtRvhO6onb089bw6TRVbeE9GrBlEcsSoCqcg/M ZEu/Oicz5wPrtdxpBeXeeP+k4QBY2nsQ= X-Gm-Gg: Acq92OHUBy+xJkXwiqj2ukZey5oC6r7i4KuDEdnxKx2629vxD4eCJl9FqCl+JFYqFbC i1Nsvcy8nIPzA63TZ3jFHd8p9Us0u/UoHE/HEg2X4Dvoo9tB9f9OU7Au7CKXxIZEhvgw7Um6teU AIQQ2q9cSRD2hQTortqC9/xKxOy+f5KXcbWO5eV+U7Sgh+m3HhiJ3Nc962gQUdRFg4r7KP1Ag/D KzhFVVLKuW8YR2ac9azfsw/XXsckBPluNVBj4ZDfENvOk2Rz5nh+yoNz/gFuhbAPSq60HlM+0ey Bpsw3DB5DTnTkhUYwtouTjUvmaw= X-Received: by 2002:a05:690c:6f01:b0:7c7:e6f6:7adc with SMTP id 00721157ae682-7d3365bfbccmr182993117b3.33.1779784241948; Tue, 26 May 2026 01:30:41 -0700 (PDT) MIME-Version: 1.0 References: <19490-9c59c6a583513b99@postgresql.org> <46FE61C9-F273-45FD-BED7-0F8CDA6EB992@yandex-team.ru> <46DB3CAB-EA1C-41A5-9D6D-5F913A2AAF66@yandex-team.ru> In-Reply-To: From: Ayush Tiwari Date: Tue, 26 May 2026 14:00:30 +0530 X-Gm-Features: AVHnY4J6FR3SL9Dlj-5t7SkcLKRMUYbWu4aN1rxKNhT4LpeaSDs-3MJ_efWl0ew Message-ID: Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 To: Michael Paquier Cc: Radim Marek , Andrey Borodin , Heikki Linnakangas , Marko Tiikkaja , PostgreSQL mailing lists Content-Type: multipart/alternative; boundary="00000000000084ab730652b44f69" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000084ab730652b44f69 Content-Type: text/plain; charset="UTF-8" Hi, On Tue, 26 May 2026 at 13:32, Michael Paquier wrote: > On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wrote: > > I think the right fix is to remove that SimpleLruWriteAll() call while > > keeping the missing-page initialization logic. The flush is only meant > to > > make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU > buffers > > but have not reached disk. In this fallback path, I don't see a way for > > the tested next_pageno to be in that state: if RecordNewMultiXact() > itself > > initializes the page, it writes it synchronously with > SimpleLruWritePage() > > before setting last_initialized_offsets_page. > > FWIW, I'm having a couple of customers complaining about that as well, > as cross-version physical replication is a thing for minor upgrade > flows. This bug is making suddenly recovery disruptive for some folks > out there. :( > We had faced a lot of replicas in bad state due to multixact replay with 16.12 release, and had to revert back the minor versions for them until 16.13 came out which was a blessing. Given the number of CVEs current one fixes, reverting too is scary. > > I attached a small patch for REL_16_STABLE. The same self-deadlock > pattern > > is also present on PG 14 and 15. PG 17 and > > 18 have the same compatibility call, but SLRU locking is banked > > there, and RecordNewMultiXact() does not appear to hold the relevant bank > > lock before calling SimpleLruWriteAll(), so I would not describe those > > branches as having this exact self-deadlock, but needs more analysis. > > So your root argument is that while the SimpleLruWriteAll() is > defensive, it is not actually necessary because it means that > last_initialized_offsets_page is -1 we have not yet replayed > ZERO_OFF_PAGE and that we have no dirty page that could make > SimpleLruDoesPhysicalPageExis() return an incorrect result, which > would be bad. I am not sure to agree that this assumption is correct > all the time, see for example the WAL message mentioned in the thread > that has led to 77dff5d937b1: > > https://www.postgresql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi Right, agreed. Thanks for pointing to that case. My v1 patch removes the self-deadlock, but the "no relevant dirty pages" assumption is too strong. The dirty page does not have to be one initialized by the current RecordNewMultiXact() call. It can already contain offsets replayed from later CREATE_ID records while last_initialized_offsets_page is still -1. In that state, relying directly on SimpleLruDoesPhysicalPageExist() can still produce a false negative because it only checks the physical file, not dirty SLRU buffers. So removing the flush can maybe reintroduce the kind of corruption that 77dff5d937b1 was trying to prevent. A different approach would be to release and re-acquire the > MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think > that it should be actually safe. Even if read-only backends evict > dirty pages between the moment the lock is released and the moment it > is re-acquired in SimpleLruWriteAll(), the pages would be would be > written to disk due to the eviction, which is what we want for > correctness. And only the startup process dirties offset pages during > recovery, AFAIK. Thoughts? > That sounds like the right direction to me. Releasing MultiXactOffsetSLRULock around SimpleLruWriteAll() preserves the flush-before-physical-check rule while avoiding the self-deadlock. I don't see a partial-state problem from the current record at that point, since the compatibility check happens before RecordNewMultiXact() has modified the current offsets page. And as you said, during recovery The startup process should be the only process dirtying offset pages; if a hot standby reader causes eviction while the lock is released, that should only help by writing the dirty page out. > Added both Andrey and Heikki in to-mail, since I'm not sure if this > > is more extreme than the multixact offset issue we had with 16.12, or it > > is at par with that. > > Indeed, let's wait for at least Heikki's input. > > Anyway, for any fixes, I don't think that it would be a good idea to > skip v17 and v18, relying on the SLRU bank locks to not conflict to > bypass the WriteAll() conflict. Let's keep all the branches across > v14~v18 in sync. > Agreed. Regards, Ayush --00000000000084ab730652b44f69 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

On Tue, 26 May 2= 026 at 13:32, Michael Paquier <mi= chael@paquier.xyz> wrote:
On Fri, May 22, 2026 at 10:21:32PM +0530, Ayush Tiwari wro= te:
> I think the right fix is to remove that SimpleLruWriteAll() call while=
> keeping the missing-page initialization logic.=C2=A0 The flush is only= meant to
> make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buf= fers
> but have not reached disk.=C2=A0 In this fallback path, I don't se= e a way for
> the tested next_pageno to be in that state: if RecordNewMultiXact() it= self
> initializes the page, it writes it synchronously with SimpleLruWritePa= ge()
> before setting last_initialized_offsets_page.

FWIW, I'm having a couple of customers complaining about that as well,<= br> as cross-version physical replication is a thing for minor upgrade
flows.=C2=A0 This bug is making suddenly recovery disruptive for some folks=
out there.=C2=A0 :(

We had faced a lot = of replicas in bad state due to multixact replay with
16.12 relea= se, and had to revert back the minor versions for them=C2=A0 until=C2=A0
16.13 came out which was a blessing. Given the number of CVEs=C2=A0=
current one fixes, reverting too is scary.
=C2=A0
> I attached a small patch for REL_16_STABLE.=C2=A0 The same self-deadlo= ck pattern
> is also present on PG 14 and 15.=C2=A0 PG 17 and
> 18 have the same compatibility call, but SLRU locking is banked
> there, and RecordNewMultiXact() does not appear to hold the relevant b= ank
> lock before calling SimpleLruWriteAll(), so I would not describe those=
> branches as having this exact self-deadlock, but needs more analysis.<= br>
So your root argument is that while the SimpleLruWriteAll() is
defensive, it is not actually necessary because it means that
last_initialized_offsets_page is -1 we have not yet replayed
ZERO_OFF_PAGE and that we have no dirty page that could make
SimpleLruDoesPhysicalPageExis() return an incorrect result, which
would be bad.=C2=A0 I am not sure to agree that this assumption is correct<= br> all the time, see for example the WAL message mentioned in the thread
that has led to 77dff5d937b1:
https://www.postg= resql.org/message-id/33319276-e4d0-4773-89e4-09084905fdb0%40iki.fi

Right, agreed.=C2=A0 Thanks for pointing to that cas= e.=C2=A0 My v1 patch removes
the self-deadlock, but the "no relevan= t dirty pages" assumption is too
strong.

The dirty page does= not have to be one initialized by the current
RecordNewMultiXact() call= .=C2=A0 It can already contain offsets replayed from
later CREATE_ID rec= ords while last_initialized_offsets_page is still -1.
In that state, rel= ying directly on SimpleLruDoesPhysicalPageExist() can
still produce a fa= lse negative because it only checks the physical file,
not dirty SLRU bu= ffers.=C2=A0 So removing the flush can maybe reintroduce=C2=A0
the kind of corruption that 77dff= 5d937b1 was trying to prevent.

A different approach would be to release and re-acquire the
MultiXactOffsetSLRULock while calling SimpleLruWriteAll(), and I think
that it should be actually safe.=C2=A0 Even if read-only backends evict
dirty pages between the moment the lock is released and the moment it
is re-acquired in SimpleLruWriteAll(), the pages would be would be
written to disk due to the eviction, which is what we want for
correctness.=C2=A0 And only the startup process dirties offset pages during=
recovery, AFAIK.=C2=A0 Thoughts?

=C2=A0= That sounds like the right direction to me.

Releasing Mul= tiXactOffsetSLRULock around SimpleLruWriteAll() preserves
the flush-befo= re-physical-check rule while avoiding the self-deadlock.
I don't see= a partial-state problem from the current record at that
point, since th= e compatibility check happens before RecordNewMultiXact()
has modified t= he current offsets page.=C2=A0 And as you said, during recovery
The star= tup process should be the only process dirtying offset pages; if
a hot s= tandby reader causes eviction while the lock is released, that
should on= ly help by writing the dirty page out.

> Added both Andrey and Heikki in to-mail, since I'm not sure if thi= s
> is more extreme than the multixact offset issue we had with 16.12, or = it
> is at par with that.

Indeed, let's wait for at least Heikki's input.=C2=A0

Anyway, for any fixes, I don't think that it would be a good idea to skip v17 and v18, relying on the SLRU bank locks to not conflict to
bypass the WriteAll() conflict.=C2=A0 Let's keep all the branches acros= s
v14~v18 in sync.

Agreed.

Regards= ,
Ayush=C2=A0
--00000000000084ab730652b44f69--