Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vWKF3-00H3ik-24 for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Dec 2025 20:04:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vWKF2-00418b-1F for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Dec 2025 20:04:53 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vWKF2-00418S-0I for pgsql-hackers@lists.postgresql.org; Thu, 18 Dec 2025 20:04:52 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vWKEy-001TIs-0U for pgsql-hackers@lists.postgresql.org; Thu, 18 Dec 2025 20:04:52 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-6418738efa0so1982095a12.1 for ; Thu, 18 Dec 2025 12:04:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766088287; x=1766693087; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QIS/c7FbvDcbkUBSWqlC6u3MROa+PAcuWa7C/6K4oCI=; b=fVbF12KOchDY71bcqsNTclSdkUWV3KcRrgZX4fg1rRCGTZq7ED1fkW2c6fRtaqfNHI UP4JIu30m0NVd3dGD2J+TRfgUOXGPPYAHgtRKykXEBY9g62JcKY6jPu0PWUCiJREN4Ez 1v9fXdcV0h0xIEd/AmhfekhpDv1CiP8JprfCB7MHoQOjelL13uHOYGbO0LKS5QUEAxrz zxEukyc+yPbC6jDwda3mX1owJIt1YjuCnTrwRpUXJhXy6XVwVJadkNfzHWb5+lhDgUOm UuNSe10oC53lvTfOMGFWgFc1fzfvpwlR/58Luk3EP7MoJqGxBS5TsmzGKT2hJM8R4J3x T8Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766088287; x=1766693087; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QIS/c7FbvDcbkUBSWqlC6u3MROa+PAcuWa7C/6K4oCI=; b=Tvt+C2I+2G+DoyvV7ZhDwCs4bNiU0l4NeKvOawCwUJKngSSt2ELxWIqoMzyBxYnqvb E4YL7E3+aU42pYDB+ceiZ12bM+sNXLFzCqXaQXCEKdHg2KN7szaPPEH3qJ3FKb0lhUgq 6dsQEH11GVHqrOn9l6MWkj2JYevFovYdpKMQWBuhuZQo2wZi4qwdS4y3HctDbG9XTXtI xQYz/B/Ay2T427p4VfX8pF41wR8D9cOCQvRVC/hAFIGYFIK1fH6+rvXObnrn/INNA/kL O580k7hPPfY3bQkUKM+e8jyMhWcH95ZYMLPzROepDaaujUQS5/RXl9dAS3PExxY4k9us sXiw== X-Forwarded-Encrypted: i=1; AJvYcCVXIgNVmZbbnI3HM7BKDyRaiiYnN8YA44kPLx14mnn2VMqGBt9otBxKT1Ic1qnKb18LnSvuCrgGCvxbQyRC@lists.postgresql.org X-Gm-Message-State: AOJu0YyLmlc6cw/1DGhE1NCCuRygrBTIQ4rW6mXpRp/WZGmrO6iXrJ59 pvANbKf4iLf+FHCfMXBykGYg6kkkhDcGdtPbqrufi7tdCpMd5YqSqlnU8awF5Yiu6/oEFpIKY48 5f5ugSfec6Rh4Kpl/hWy7OXT4s7SOFkY= X-Gm-Gg: AY/fxX7y9obQUnTpVGL6ImGQMQ333tOQGfIM55fxRJ11y1NtnyJAC82w6suyH6S24q0 fKDqanhPVZIgprsqmxUoQomtuToIimpVG1rvLRt7g+j0IcQpSkR0ZLDRcPDyhqaRx36hNNpvZQW aGsOgsMG3byj77/VHDrLulgz/8Jj+4oDtiP/tb29hcEoPi2hspgTwQhR7+IxEDOHPdYs6tTER5Y qvmE/k3HVnqOM9mdiVKTRgpcD577u/BAfE6y1+56JnwNDWjZvyYDMxCl362hRbtOvfHtO0L X-Google-Smtp-Source: AGHT+IH2rHireEiZcaP0TZ+ySyMk/yBpLMXzviX7FcVMKWyijJS91Hr3RbaM9Qwwf+hhUy3+K6oOvNmmBJCURXjtNG4= X-Received: by 2002:a05:6402:3550:b0:641:8a92:9334 with SMTP id 4fb4d7f45d1cf-64b8e938162mr562098a12.6.1766088287231; Thu, 18 Dec 2025 12:04:47 -0800 (PST) MIME-Version: 1.0 References: <2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek> In-Reply-To: From: Melanie Plageman Date: Thu, 18 Dec 2025 15:04:34 -0500 X-Gm-Features: AQt7F2ojOUortfUQKriZOtbxSEr_GH4OHZHjhgrA4fvwIeTCO0iZZUxjbsKhXGg Message-ID: Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) To: Kirill Reshke Cc: Andres Freund , Robert Haas , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas , Chao Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Dec 18, 2025 at 10:46=E2=80=AFAM Kirill Reshke wrote: > > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman > wrote: > > > Also, after the whole set is committed, we should then never > > > experience discrepancy between PD_ALL_VISIBLE and VM bits? Because > > > they will be set in a single WAL record. The only cases when heap and > > > VM disagrees on all-visibility then are corruption, > > > pg_visibilitymap_truncate and old data (data before v19+ upgrade?) > > > If my understanding is correct, should we add document this? > > > > Even on current master, I don't see a scenario other than VM > > corruption or truncation where PD_ALL_VISIBLE can be set but not the > > VM (or vice versa). The only way would be if you error out after > > setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE > > is not in a critical section in lazy_scan_prune(), so it won't panic > > and dump shared memory, so the buffer with PD_ALL_VISIBLE set may > > later get written out. But the only obvious way I see to error out of > > MarkBufferDirty() is if the buffer is not valid -- which would have > > kept us from doing previous operations on the buffer, I would think. > > Well... I may be missing something, but on current HEAD, > XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different > record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So, > WAL writer may end up kill-9-ed just after > XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and > XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have > discrepancy. This does not happen with a single WAL record. > Another simple reproducer here: standby streaming, receiving > XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad, > and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by > the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I > missing something? Well, currently XLOG_HEAP2_PRUNE_VACUUM_SCAN doesn't set PD_ALL_VISIBLE. PD_ALL_VISIBLE is WAL-logged in the XLOG_HEAP2_VISIBLE record because in lazy_scan_prune() we call PageSetAllVisible() and then visibilitymap_set() -> log_heap_visible() adds the heap buffer to the WAL chain (with XLogRegisterBuffer()). And if you notice when XLOG_HEAP2_VISIBLE is replayed in heap_xlog_visible(), that is where we do PageSetAllVisible() on the heap page. So I think you can end up with PD_ALL_VISIBLE set if you error out precisely between setting it and WAL logging it because we don't set it in a critical section. But you can't end up with a WAL record that sets PD_ALL_VISIBLE and another one that sets the VM. Once we have my code changes, you can never end up with PD_ALL_VISIBLE set and the VM not set because they are in the same critical section and if we error out, it will cause a panic which will purge shared memory. - Melanie