Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4Jat-0024tJ-1u for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 14:15:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4Jaq-00DKdv-24 for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 14:15:53 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4Jaq-00DKdn-0c for pgsql-hackers@lists.postgresql.org; Sun, 22 Mar 2026 14:15:52 +0000 Received: from mail-qt1-x830.google.com ([2607:f8b0:4864:20::830]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4Jao-00000000Tfv-3Fm4 for pgsql-hackers@postgresql.org; Sun, 22 Mar 2026 14:15:51 +0000 Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-509217e84a3so30779731cf.3 for ; Sun, 22 Mar 2026 07:15:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774188950; cv=none; d=google.com; s=arc-20240605; b=I7JNL8ZutUTDvnRSwga2VNFnEekhIKFH7u6uI05rPe0WUbCLlH1493fxZkToz55UwO 3Y+0iSGBjW+do2a0u0kr1nQZO67PskUKV4O17iO8veTrKqM94cudFKfsBLhu8nN5Cury wotm0+KRhejHIa2hUahVPZljSFBCz225K78e+pmgmpbLGhaicdYZ4kbA3+FJWrLkwDml ApHa7X0iJY6Yhmr58LM3G/en6su5P52dWXp/pzZrflCnEDxntQdPHyqi9Q4oX1WFQAzC uygacA2f0CrRWu4I5/1fZ+V5iCez4Jo26iUmUf8jbDoRDvp1LfHJOynPeBGCD6BL5qUe kgkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=J2tyC5HHWpiFBMwi18+pxULszUGG+m535nOG2iGbEgQ=; fh=tCyv1UTF0V0m8S0w2gVwYUANAvr6hKzu5m4TBkX9SBg=; b=RaodxiNV32vWGyh/ftdxtNWHhyr9v9yotPQxSukla/1/iyanOpQN4K/JIAZ4IwbjCO RO3uDSM01oOCtxJgNkTCpnsfNJni/Rh9I4ZDD0GcNaoO3N4shE5bzHDk9dU49mMZnh52 L8A/ON//FoaJGHjMZwf1t8Xw0nCld6SKpr111MTS/SkrIVwBSxhczFTcrVHfmiaJuAwO l2S5gnuxBxw2tcMoS62uVcW9fYPNu07oaMeUntHB0wxjPesFpZavwW16wI6aX85Fo/wb xyJOBvVP4jHbAFN8iKT83vejhJSMRxTHLIbfJve4h6LrTOa139wElzN8e+a8Hi6WQ8ln EVGQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774188950; x=1774793750; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=J2tyC5HHWpiFBMwi18+pxULszUGG+m535nOG2iGbEgQ=; b=EFRcVa0islBUJ2TNUwsh7Cs+lE5aLDfMf2HNuG0U/ILeJ3MrTFmiGVK7BhZl9Mu3I3 LYNidRE4/Q8y5/rtE8kylDh3+192EUcJuXbAOkLPw6uVWj0JimTvmTDSfeW5nJX7BX8Q uTpj+uSa4JL1rh9o9qxqn+Os36clcUZooCX7kJKG9ujOZXeDblqdSS3MTzOjo7TdlgFn F/0PmU2lcbuwXMRHSeyYS8Rvok/Zdl/vVxgQ4mj7R1NMMmCJaW0N0bJ7PLXZAaFAPjlk QdCTjdYjpOH5Jq7gHh2fngTHe2LqlyGwPCG1ZDlsWCb+zlAZGHlhb0UP+RarnIUkVVIy NhPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774188950; x=1774793750; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=J2tyC5HHWpiFBMwi18+pxULszUGG+m535nOG2iGbEgQ=; b=r1BLcUAjvQ4mOK0XXkj0vTyI56ky6wCYA1U4yAYfjTTV9WPTqW4d5CnLWk/0PXSJnl HbMJFcLIu1GSPfyZIKXFF2Ca44n+vWEmp0wJlFvMCkQkb8Er/mVZ7GdLHm1NPsiG2DFx baQ2oOjnpuYkKMlDlgpdkAdZc91WBzn1lkVoN/SffaJBuVYvkLCfVciljT+oKUgg0qZu zSfhJURjHapULrEQBuYdIlvP36jK0M1brpI5AOWYTmkromrAWI0ZKD7ircgSFB5mBtIX BjDmMkeDaVQeW1puKLiqhwxaqcp4wdZ0PvLnp3vzZv7uhSe6ux6NFSMVN0RQGKSyGvJ1 Jlrw== X-Forwarded-Encrypted: i=1; AJvYcCXnjuoSncyYQFW45rjuxZszekHR+4WNkdDMyz7u66igRZYfUQtU5sIjh+N/O2AxEpLMrdH0VxpOC9Ao1agj@postgresql.org X-Gm-Message-State: AOJu0YygGuQDK0VYV/WaLXVWGN8DOUkKlwPzrkKeDRR0BfnaZ+mz04HE nJx6Y8Z6ehIA0QA7y3B+RaPMDIgGssrxLKxMLPBChe2k+DR5UeFziExIjHBBUQiD03dEUGJA345 C+GVVHwVe8MXX/C/U2HCo+k9lLhezJcU= X-Gm-Gg: ATEYQzyJbEzbT1Lq+NYaa+JCdnYR2oIhYicrZgzDrhurs5YL8i7c6hvQ5UWROJrCooP BRfEc9TMGbD1emnxv84WkjDgV9jX2gym4ODn5Nd8vIF6VAvvOfBCjAHwvLdQ9QX/bsYOv+cCNU+ oQ9ZPm478fTLOmimLp6OT6ULfpDrNoppMK0JOMrqhCyiqwAkWs2vvhm6ntOrA0ns3Q97qoejbex DBfVZOn3l5W+h/wECrOVkuhfZyAAe+7C7oerstyW7cGuBvEv/gh1iqWiznLa/HBcnEfKybd3XIR vqr4yASjnCOqKAK3rJYBXqrsbJbA6ZnZkE3Y64nVIaV1B6E8kivug8VFHkr173+QeVfgXCQUEFG DlV4g X-Received: by 2002:a05:622a:1b29:b0:509:2ef7:7048 with SMTP id d75a77b69052e-50b3756847cmr152306341cf.66.1774188949796; Sun, 22 Mar 2026 07:15:49 -0700 (PDT) MIME-Version: 1.0 References: <18a40051-abc2-4fc2-8dcc-4dc39aa3e79e@iki.fi> <452355ef-7c20-4d96-88a2-8fbb49737dc9@iki.fi> <5b7f0a04-4a60-44bb-9d2c-8917af0b10fa@iki.fi> <33319276-e4d0-4773-89e4-09084905fdb0@iki.fi> In-Reply-To: <33319276-e4d0-4773-89e4-09084905fdb0@iki.fi> From: Kirill Reshke Date: Sun, 22 Mar 2026 19:15:38 +0500 X-Gm-Features: AaiRm50lSdMZnrGZuwPkdkYCAXsv0kf2WkLaE6ZeWj7P35xbrnSbmRj_X3EX_dU Message-ID: Subject: Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery To: Heikki Linnakangas Cc: Andrey Borodin , =?UTF-8?B?5q615Z2k5LuBKOWIu+mfpyk=?= , pgsql-hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi! I can see that the back-branches commit was included into master [0]. I think this is good. On Sun, 22 Mar 2026 at 16:10, Heikki Linnakangas wrote: > > On 20/03/2026 19:05, Andrey Borodin wrote: > >> On 20 Mar 2026, at 18:14, Heikki Linnakangas wrote: > >> > >> Zeroing the page again is dangerous because the CREATE_ID records can be out of order. The page might already contain some later multixids, and zeroing will overwrite them. > > > > I see only cases when it's not a problem: we zeroed page, did not flush it, thus did not extend the file, crashed, tested FS, zeroed page once more, overwrote again by replaying WAL, no big deal. > > We should never zero a page with offsets, that will not be replayed by WAL. > > I think we're in agreement, but I want to verify because this is > important to get right. I was replying to this: > > > If we are sure buffers have no this page we can detect it via FS. > > Otherwise... nothing bad can happen, actually. We might get false positive and zero the page once more. > > My point is that if we rely on SimpleLruDoesPhysicalPageExist(), and it > ever returns false even though we had already initialized the page, you > can lose data. It's *not* ok to zero a page again that was zeroed > earlier already, because we might have already written some real data on it. +1. Even if we manage to compose a "fix" that zeroes a page more than once, this "fix" will be non-future-profing and we will corrupt the database if anything goes even slightly wrong. > Let's consider this wal stream, generated with old minor version: > > ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 > > 2048 is the first multixid on the page. When WAL replay gets to the > CREATE_ID:2047 record, it will enter the backwards-compatibility > codepath and needs to determine if the page containing the next mxid > (2048) already exists. > > In this WAL sequence, the page already exist because the ZERO_PAGE > record was replayed earlier. But if we just call > SimpleLruDoesPhysicalPageExist(), it will return 'false' because the > page was not flushed to disk yet. If we believe that and zero the page > again, we will lose data (the offset for mxid 2049). > > The opposite cannot happen: if SimpleLruDoesPhysicalPageExist() returns > true, then it does really exist. > > So indeed we can only trust SimpleLruDoesPhysicalPageExist() if we are > sure that the page is not sitting in the buffers. +1 > Attached is a new version. I updated the comment to explain that. > > I also added another safety measure: before calling > SimpleLruDoesPhysicalPageExist(), flush all the SLRU buffers. That way, > SimpleLruDoesPhysicalPageExist() should definitely return the correct > answer. That shouldn't be necessary because the check with > last_initialized_offsets_page should cover all the cases where a page > that extended the file is sitting in the buffers, but better safe than > sorry. > > - Heikki I played with v2 and was unable to fool it into corrupting db. So v2 looks good to me. [0] https://git.postgresql.org/cgit/postgresql.git/commit/?id=516310ed4dba89bd300242df0d56b4782f33ed4d -- Best regards, Kirill Reshke