Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2W1G-000MLn-24 for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 15:07:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2W1F-002TwN-1T for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 15:07:41 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2W1F-002TwB-0B for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 15:07:41 +0000 Received: from mail-yw1-x1133.google.com ([2607:f8b0:4864:20::1133]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2W1B-00000000Cdo-2aje for pgsql-hackers@postgresql.org; Tue, 17 Mar 2026 15:07:40 +0000 Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-799001d73bdso44195037b3.0 for ; Tue, 17 Mar 2026 08:07:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773760058; cv=none; d=google.com; s=arc-20240605; b=Kuc+VkUFP18kkF76UKxaMtmdugP0XhoDOpwsrUTrhGVofMmofBVQArydTV4PyAao/5 uvssx/U8Gnyp/AhVTZ9g+FntJqAsfPLc7whhuq/3EAtaEUnEzbzdCrYA/9PoAEd1Q5/I BAurCE+Z1Ldb8C9EvRKhvr1lBVdJ0iCI2HrFncLbWG/6f+XW1FAeiE5NGMluB1Y+AyBe 7I4cLaM8+Gfr12GCbn2tQVPOHzfo/MqfNSCsRAX95U8tAzpKAMRkgWsAPjDz0elkVQEj pc5eET9vZrNIZubR/w07BhkyNYuSUGcwGWplkAuFSdOpU99v2XQfD5Z0wuDYmX/MG4QM aCpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=W1Kby9/M9nwOfiCT14CaXedKaGZvknRx5q2qpAuAXDk=; fh=UDyRx4SUIKHaBnYwYNT4Dg3CaQDAcjadjWIPu+sGmaE=; b=GCcjjdl3OX3QSvbPkT2Z/7+zLP54jSYIyzLgNUcWuWHZ2tfSLdQztvFZ2mpoG+OH03 wMhCLl2sMbi2G8QmUtxmO6BxBJTJXvijo1jdH9eFL4qoZBLEfM1Sl5hLtZi/nOT5HY6e Bzw3F0G0YQVjJiSXKSpjomaxZqxB//ySv0KzjHNz04afoJdWJ8ouyPRQl0E6Dg8QGE1G 6vudQx0a6ySx94HJjzAZdr/NEgdfYU46URLneyvv679Di+QcHW2Xv3IE6c6SPhgdMbAw kjcD0rNEizuAtYyebm0iCOcoI9H/d3pT6rHB6MipE588J8Ve6ze6JXbYQGhOVCiuKhqM sYjA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773760058; x=1774364858; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=W1Kby9/M9nwOfiCT14CaXedKaGZvknRx5q2qpAuAXDk=; b=m0dJNwOisdQA9zmVwFm76BMisNogbccpvJr8tX1BzsMQrY/vXLSU3sbRX8eZhgJTTZ oOoOeTTFszGL9tkKHJ8rNHFcIXw8hvuTCq5BfyxVDnRdDPY5IneQvjXX4SGF/bBjl8ZZ XINEVSIoWj4/EDosbgoRY5C5Dr813dLURRGIu1qnKDZ7eHlfh0Nj3z0+DX/3luQkwxr9 ZyswItxqEFqzgXe7XHWWGakmdVZ+RX0jRH/qx8auTAFGAABZcffrPZI06LK5tSTT+rc5 ZUuKaiuQYMyGfS0d2nxrmKRX2MlZdWsfVE3JkK4jebxTitkrGEbUIU2N4qBcSkRLfnKD 8M7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773760058; x=1774364858; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=W1Kby9/M9nwOfiCT14CaXedKaGZvknRx5q2qpAuAXDk=; b=rJuBDdOjzzjhGWHoS4ZXMROlJLoKnHE28F5iHU9YJHyDllZXcHeDQ+vWM250HqMVoH zjqQtgLPgL0BoK6g8zlMSBBU3E6GDqydTVRz6KkWNM32KxTdbueCLl2THqn6Us9dlWUQ wOnTSo83AiOPWegEFBtvM+A9gg0rjlyhTIppInHHuMBXkTlbFnkp3LdSTM1sURcbju9c n7MzzBGSVKPdUOWQle/x59wsdYMebHiNzRHVDNLi3G7Ej6QuywV5SgcqI2r6Yyl1Xuo8 6UEAr+L9MSWkUEH+g4Q3RWTlWJ97SkxYXoK4v5nivD8HWPuSRveuJaBjU4lTGyWE9e1J H04A== X-Gm-Message-State: AOJu0YxpyWw1g42bGSqffKoNtCgomIXcAvYjrFZvfsmrzxLAmSN/weeL 1y9ylTXCrCjluIh0ndnROPLQ3HU7hQ0R+oK03Wlq9vJwldIshVyZ6gIDbLq9ekPPArDvHwwkJEw GLzYgWp1M9gZxiSa1T9ica3NVm20KtvA= X-Gm-Gg: ATEYQzzcLVkdy1aV8TFBbv3qN4zadwze5vYXhMYmgYXewzSWrKzx19ICzF7PUG3cBr/ tDayXR05x94Q6YkZ9gc9z57ywHMlkrUDt31x8m4FzJ1KcaZWqpKNA6TxuEHyjasfGqPN65eyFBv MxrH/QHw1OEAJ4Nxzgol0KXKXrFSUX5k3Peqv3nxvI18PiYMmlAAheLMQhoWNTz4o8vTj60tBnX Uc58TkvdUa3gy2FttuKbOqzu8Aj/SNBAsMRZEhqGelYHOv+CpqkkxQcqdHVopMkfjo2/0/pkS3Z F2WQb4jubGN9uQg0aQ== X-Received: by 2002:a05:690c:84:b0:793:db81:f1dd with SMTP id 00721157ae682-79a1c072086mr167366327b3.4.1773760058330; Tue, 17 Mar 2026 08:07:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ayush Tiwari Date: Tue, 17 Mar 2026 20:37:26 +0530 X-Gm-Features: AaiRm51vYDmP-gv3LfQ6gy4Q8fOOf82426v0U-3_RBj9MN4wOSITZw7Hc82KgBg Message-ID: Subject: Re: Proposal: Prevent Primary/Standby SLRU divergence during MultiXact truncation To: Heikki Linnakangas Cc: pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="0000000000003193db064d39b282" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003193db064d39b282 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Thank you for the response. On Tue, 17 Mar 2026 at 03:40, Heikki Linnakangas wrote: > > Replaying the record will perform the same sanity checks against > wraparound as the primary does. > > Hmm, although why did I not apply commit 817f74600d to 'master', only > backbranches? The bug that it fixed was related to minor version > upgrade, and thus it was not needed on 'master', but the code change > would nevertheless make a lot of sense on 'master' too. > Agreed, once 817f74600d is on master the standby would honestly evaluate the SimpleLruTruncate wraparound backstop instead of bypassing it. However, the backstop is documented as catching "wraparound bugs elsewhere in SLRU handling." If such a bug corrupts latest_page_number on the primary, the standby =E2=80=94 which derives its latest_page_number indepen= dently from ZERO_OFF_PAGE replay and StartupMultiXact() =E2=80=94 would not share = the same corruption. The primary would skip the truncation, but the standby would see a healthy latest_page_number and proceed. > Have you been able to reproduce that? > I have reproduced the primary-side condition on an unmodified tree using gdb in batch mode: attach to the VACUUM backend after WriteMTruncateXlogRec() returns, corrupt latest_page_number, and resume. The primary logs "apparent wraparound" and skips the physical deletion, while pg_waldump confirms the TRUNCATE_ID record is present in the WAL. I have not yet set up a streaming replica to demonstrate end-to-end divergence and promotion failure. > > I agree that would probably be better. I'm not sure how straightforward > it will be to implement though, I wouldn't want to add much extra code > just for this. > One approach that might keep the footprint small: we could inline the same PagePrecedes check that SimpleLruTruncate uses directly in TruncateMultiXact(), before START_CRIT_SECTION(). Something like: if (MultiXactOffsetCtl->PagePrecedes( pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number)= , MultiXactIdToOffsetPage(PreviousMultiXactId(newOldestMulti))) || MultiXactMemberCtl->PagePrecedes( pg_atomic_read_u64(&MultiXactMemberCtl->shared->latest_page_number)= , MXOffsetToMemberPage(newOldestOffset))) { ereport(LOG, (errmsg("skipping multixact truncation due to apparent wraparound"))); LWLockRelease(MultiXactTruncationLock); return; } No new functions, no changes to slru.c or the replay path =E2=80=94 just th= e same condition evaluated earlier so we never enter the critical section or write WAL for a truncation that won't be carried out. Does this seem like a reasonable direction? Regards, Ayush --0000000000003193db064d39b282 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Thank you for the response.

On Tue, 17 Mar 2026 at 03:40, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Replaying the record will perform the same sanity checks against
wraparound as the primary does.

Hmm, although why did I not apply commit 817f74600d to 'master', on= ly
backbranches? The bug that it fixed was related to minor version
upgrade, and thus it was not needed on 'master', but the code chang= e
would nevertheless make a lot of sense on 'master' too.

Agreed, once 817f74600d is on master the standby would = honestly evaluate the SimpleLruTruncate wraparound backstop instead of bypa= ssing it.

However, the backstop is documented as catching "= ;wraparound bugs elsewhere in SLRU handling." If such a bug corrupts l= atest_page_number on the primary, the standby =E2=80=94 which derives its l= atest_page_number independently from ZERO_OFF_PAGE replay and StartupMultiX= act() =E2=80=94 would not share the same corruption. The primary would skip= the truncation, but the standby would see a healthy latest_page_number and= proceed.


Have you been able to reproduce that?

I= have reproduced the primary-side condition on an unmodified tree using gdb= in batch mode: attach to the VACUUM backend after WriteMTruncateXlogRec() = returns, corrupt latest_page_number, and resume. The primary logs "app= arent wraparound" and skips the physical deletion, while pg_waldump co= nfirms the TRUNCATE_ID record is present in the WAL. I have not yet set up = a streaming replica to demonstrate end-to-end divergence and promotion fail= ure.=C2=A0

I agree that would probably be better. I'm not sure how straightforward=
it will be to implement though, I wouldn't want to add much extra code =
just for this.

One approach that might = keep the footprint small: we could inline the same PagePrecedes check that = SimpleLruTruncate uses directly in TruncateMultiXact(), before START_CRIT_S= ECTION(). Something like:

if (MultiXactOffsetCtl->PagePrecedes(=C2=A0 =C2=A0 =C2=A0 =C2=A0 pg_atomic_read_u64(&MultiXactOffsetCtl-&g= t;shared->latest_page_number),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 MultiXactI= dToOffsetPage(PreviousMultiXactId(newOldestMulti))) ||
=C2=A0 =C2=A0 Mul= tiXactMemberCtl->PagePrecedes(
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pg_atomic_= read_u64(&MultiXactMemberCtl->shared->latest_page_number),
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 MXOffsetToMemberPage(newOldestOffset)))
{
= =C2=A0 =C2=A0 ereport(LOG,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (er= rmsg("skipping multixact truncation due to apparent wraparound"))= );
=C2=A0 =C2=A0 LWLockRelease(MultiXactTruncationLock);
=C2=A0 =C2= =A0 return;
}

No new functions, no changes to slru.c or the repla= y path =E2=80=94 just the same condition evaluated earlier so we never ente= r the critical section or write WAL for a truncation that won't be carr= ied out. Does this seem like a reasonable direction?

Regards,
Ayu= sh
--0000000000003193db064d39b282--