Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4JhX-00250A-24 for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 14:22:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4JhV-00DP0g-37 for pgsql-hackers@arkaria.postgresql.org; Sun, 22 Mar 2026 14:22:46 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4JhV-00DP0Y-1m for pgsql-hackers@lists.postgresql.org; Sun, 22 Mar 2026 14:22:46 +0000 Received: from mail-qv1-xf34.google.com ([2607:f8b0:4864:20::f34]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4JhT-00000000TjU-3Dnr for pgsql-hackers@postgresql.org; Sun, 22 Mar 2026 14:22:44 +0000 Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-89c5340fed0so36799216d6.0 for ; Sun, 22 Mar 2026 07:22:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774189363; cv=none; d=google.com; s=arc-20240605; b=VavAyM6M+CqBOQXctu5fLDZopaoSGEHgcWP7Kgivbf0xJZEz6ca9RL7KG1DfExTHNS qGheuXcimurpJJjSoyKvwz/8oskG0wArVidx0nbE5Fv1uZTjM3jTEtqfCV4T5pcNpRMU jOIcW5dGUJTvuYF8YJ85ojNEl+ggBLVIgObXpktgHWi/b911GiioCfdVooc8m0pPUf7u AiEI0humCqfbbab3BoLmgE5ZGQ1ay1ByX8we/wP8Sv8PX+GTJ9Pau8wNvoDj3ICjT3ZU wW3c1kp0AfI9SwkduMO3r+5br2vWPwDW96C8799UA8IJ4MK6s7t4wdIYy0Wuk6Tn+PEA LnhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=yz1HDHiCS5ZO/rSNWTUeBUDZpaqN9ijsGKhOdAcmyHQ=; fh=mZjz+wCa5PB0XeaOLNgzMGDcxXLwRtB6Fm7ox9ORDKM=; b=RJL6m0ZQOLJBZ3stQ66OaVk5b6yCpEhNVUOLIMqQ1GCwmmfrOjJlLj8vTeZEBjXHG4 LiKhltjkGalogoKpdxpCwzaK18WkIGG7QEDudDHdfv8NUttxqtvfhIB3eOzMAX8kMMI2 scNChfCaVOx9LDHdMZUWddueI5af/bJALxCzwTrfCDIllwX7dbfCl9bMBPEokXnI12pB DCknp3zDmSxYM+bXme8XxZ8V1QovoMfMMrKyrP6uUHN762rEoxMR6DclleDomIuuPrg+ bdO1+8AfpStnt94sXEyDxg0UtZxrLnvffutk8eYz9x4JMtNXHeF/NuZPeRNkft1ULTl4 cP1g==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774189363; x=1774794163; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yz1HDHiCS5ZO/rSNWTUeBUDZpaqN9ijsGKhOdAcmyHQ=; b=l8Nm8I1z3rMGno4csS4MZ31g29Rum9WBuB2YMDXgC7x8JNw80RBcUhFgHyVhuRKstB 87XaB7qGhX9E6X4qXFQV7y1gBSmO/iQXZtFp1Z9M/yZ9c97H5cYcPPTpS8PJTXoKjB04 ibkfS9ngHOFdazG/8aZNMFu5u8k6xeUbb5wIvFh8R3hfeCbNDI5FbennAn/3gexSGhWO LFY3m+ur+hle/5YEEOZDIPCFpSm6eWXZVAeSoof4GVMlnIN5g74g2tI+iViUZCSUqPts jAAEtBcxMV4el6QjH6EpFkpRkYoRTbfOCF0K7nQ7ufFb+zhzgvfUjjUJybDSo+ifDGvA u5xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774189363; x=1774794163; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yz1HDHiCS5ZO/rSNWTUeBUDZpaqN9ijsGKhOdAcmyHQ=; b=Va2/2XwUHd65TyAmeKnSd2YIMF8GJm2ISbMFq/7jK0o9yol933YAOqQJqzAeToKJnN gX9Hr5VOS3r5K5sin8gCBCnrv5qDDTCzZYEdZD6q5ppe4hDcWxK2unECu/AwDTnMtWDO DD1lmR6iXX4jXn+W3wB9QZNc5iyRh7hsRn8Da3THWQPJjmntU5Qiq1bAuRV7ZpQi8mlM YJNiZm8Te/kqn2GDpUrQ/PpdwbI0y0OsHqo2LWwGqZqI98pRnlUCtanRZgPNrxPEZXAt dgJRzcn/otGn29JTsTuic0sGbKLn0qoKWtlqxIaA2oOQXguHaG0D4PFnQrw9AT32rUhz ERxQ== X-Forwarded-Encrypted: i=1; AJvYcCXhzK3C7r1+wyDcQmXmuoxFBHXxulWEDQDDLYOYO4Ly3GrmVVOEMRr3F0znoks3Hf2xWKTn6hv45lwL62Hk@postgresql.org X-Gm-Message-State: AOJu0Yz3S+mVUF5alUO0X74TxqKZsXG1t+io1PNm4SDzoN07WJgy2G56 9cikVl4P4rIZKYHGS9er4/EH6hoHOdf+4En7RhJHPx3Zm+eKB1qaJnr00JJfurPmcdUd7ADEpO2 gXGoKNaVetF6vj8gTNFEQvqppbLnONzs= X-Gm-Gg: ATEYQzzAbqqLMGT8yi3XEg6825pl6vqaaHmFM1NgQuBwUQNCdAcSlXnWI658WuGJ1NK YPxSqFiTU5xTDWjKaO5Pn7cwEJcvSf4dOTY5Ni7A+RIu+Y0fNTyCXWX9mZ2FmKJxhBy7b265mq4 97GDIqru8nvq7xri4Y/tf8pOukGYcBQwl/l4JA8xh4xVr+mr06gzudt2pKC1EWSq1zYPo4DWJXB 69L6AI0OY9kyc2krQeC3088E97S0kKrA1/QwvCnXQ8edVy7U52jVhMmQ8sN9YByTqJyx0Y2/kx3 POzjqX3+Oo+PrRk7T/IvKLieJo4AV0AdSrBuq1AY++oT6Vb8te/ECVXq6AJJ5oY0elMaMHruCgm b3d2s X-Received: by 2002:a0c:cd8f:0:b0:89c:866d:243d with SMTP id 6a1803df08f44-89c866d252amr101363766d6.35.1774189362832; Sun, 22 Mar 2026 07:22:42 -0700 (PDT) MIME-Version: 1.0 References: <18a40051-abc2-4fc2-8dcc-4dc39aa3e79e@iki.fi> <452355ef-7c20-4d96-88a2-8fbb49737dc9@iki.fi> <5b7f0a04-4a60-44bb-9d2c-8917af0b10fa@iki.fi> <33319276-e4d0-4773-89e4-09084905fdb0@iki.fi> In-Reply-To: From: Kirill Reshke Date: Sun, 22 Mar 2026 19:22:30 +0500 X-Gm-Features: AaiRm539b7rBzeHGacSAKFVV0_xzh67lfi0wLZMzPvCPd2OhhKhbM0juLVgpouQ Message-ID: Subject: Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery To: Heikki Linnakangas Cc: Andrey Borodin , =?UTF-8?B?5q615Z2k5LuBKOWIu+mfpyk=?= , pgsql-hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, 22 Mar 2026 at 19:15, Kirill Reshke wrote: > > Hi! > I can see that the back-branches commit was included into master [0]. > I think this is good. > > On Sun, 22 Mar 2026 at 16:10, Heikki Linnakangas wrote: > > > > On 20/03/2026 19:05, Andrey Borodin wrote: > > >> On 20 Mar 2026, at 18:14, Heikki Linnakangas wrote: > > >> > > >> Zeroing the page again is dangerous because the CREATE_ID records can be out of order. The page might already contain some later multixids, and zeroing will overwrite them. > > > > > > I see only cases when it's not a problem: we zeroed page, did not flush it, thus did not extend the file, crashed, tested FS, zeroed page once more, overwrote again by replaying WAL, no big deal. > > > We should never zero a page with offsets, that will not be replayed by WAL. > > > > I think we're in agreement, but I want to verify because this is > > important to get right. I was replying to this: > > > > > If we are sure buffers have no this page we can detect it via FS. > > > Otherwise... nothing bad can happen, actually. We might get false positive and zero the page once more. > > > > My point is that if we rely on SimpleLruDoesPhysicalPageExist(), and it > > ever returns false even though we had already initialized the page, you > > can lose data. It's *not* ok to zero a page again that was zeroed > > earlier already, because we might have already written some real data on it. > > +1. Even if we manage to compose a "fix" that zeroes a page more than > once, this "fix" will be non-future-profing and we will corrupt the > database if anything goes even slightly wrong. > > > Let's consider this wal stream, generated with old minor version: > > > > ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 > > > > 2048 is the first multixid on the page. When WAL replay gets to the > > CREATE_ID:2047 record, it will enter the backwards-compatibility > > codepath and needs to determine if the page containing the next mxid > > (2048) already exists. > > > > In this WAL sequence, the page already exist because the ZERO_PAGE > > record was replayed earlier. But if we just call > > SimpleLruDoesPhysicalPageExist(), it will return 'false' because the > > page was not flushed to disk yet. If we believe that and zero the page > > again, we will lose data (the offset for mxid 2049). > > > > The opposite cannot happen: if SimpleLruDoesPhysicalPageExist() returns > > true, then it does really exist. > > > > So indeed we can only trust SimpleLruDoesPhysicalPageExist() if we are > > sure that the page is not sitting in the buffers. > > +1 > > > Attached is a new version. I updated the comment to explain that. > > > > I also added another safety measure: before calling > > SimpleLruDoesPhysicalPageExist(), flush all the SLRU buffers. That way, > > SimpleLruDoesPhysicalPageExist() should definitely return the correct > > answer. That shouldn't be necessary because the check with > > last_initialized_offsets_page should cover all the cases where a page > > that extended the file is sitting in the buffers, but better safe than > > sorry. > > > > - Heikki > > I played with v2 and was unable to fool it into corrupting db. So v2 > looks good to me. > > > [0] https://git.postgresql.org/cgit/postgresql.git/commit/?id=516310ed4dba89bd300242df0d56b4782f33ed4d > > -- > Best regards, > Kirill Reshke Also, in commit message: > the backwards compatibility logic to tolerate WAL generated by older minor versions Let's define older as pre-789d65364c to be exact? -- Best regards, Kirill Reshke