Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vWFlu-00G2kk-2G for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Dec 2025 15:18:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vWFlt-002ipL-1f for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Dec 2025 15:18:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vWFls-002ipC-3A for pgsql-hackers@lists.postgresql.org; Thu, 18 Dec 2025 15:18:29 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vWFls-001LeD-0j for pgsql-hackers@lists.postgresql.org; Thu, 18 Dec 2025 15:18:28 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-b79ea617f55so155720466b.3 for ; Thu, 18 Dec 2025 07:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766071101; x=1766675901; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6LS7zrRLwUHSVv/aCu+WDJR+irHZbKekTKBu4etrHVQ=; b=M07NE5Z6p1Xj2QJzRVHECHEzIDBiO7u56sm1IfQlVng56QBm1eUBw1Zf4XCn1g6PJD utVSVLmiea+zuJZ/GiMJRBiI11CJl+n+XRUcu4MgvqMG1oCZC07TBS1hzwEprpwquHlR OhtfntncJcUC83gSi1KY4bCNrlpe6r1fnrlTpzxMJYyg6N8Fbp86+GIx3WQOU9nae2oT 7lrBQV6fyF0unoQDbvRmDY29IpqSF9gn7PLbAvNNaqHN2JMfX+Ec+ntvAKk02vLi9a9j GIAJJTFMEAY2tuLO7yRbuP8UgkJFb9urC1RK10BqML8axnc1f7kmFpbOHW0fwfH2gX7o h3gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766071101; x=1766675901; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=6LS7zrRLwUHSVv/aCu+WDJR+irHZbKekTKBu4etrHVQ=; b=aVNLivqRD8+a+gg4D5aUgePEpCsSCW43D4wUoOcc+hA6fPUdWxyt3hB4b7aA7UveHg iCY2e2n/r78sYX4B1aXnyGXMx1EOIPX7652yVr9rvXor40ZNRoG8gNY697wTkR75/jJQ 1NHY6EHLUMznarUhQJudsWE5q/VZ1PJdBz8XUJ8Gm89A/V0Z5syGMJ1cWweFo/aNSWsS 1q76np43McKChUcxFSnViXaPQNuFzK1vIwm7b2DHuxwNb3U2GWoAvGbRzxLkHGczL4Tl sIMJLubFItG36FYdulBIOn1AYxvNL2STuozJ4ab2Q6Yo8cCJpbU0qedQkdIWBzP/blDH zAsA== X-Forwarded-Encrypted: i=1; AJvYcCUJRJGtKMNlHtIwlrISuiQfk4utdoOkOmR2c4nb5Lt5Y985N0BAfsVxqK2krOCdvOTSqX8k1AuMoRDBY8fP@lists.postgresql.org X-Gm-Message-State: AOJu0YzGPbILwjDE6scbAkgP/jH+REpjD8NaAfgBqAScfpook+WObvXK IdnMlALZtXsm/sK+ATeB837tnaVZkse4TXgipEnuLVXvGXWukVAZg0U4vKkz72AdJlVE6dzv63D TydLPfsX3I2DfZuMEvBJooE/PqfexCxg= X-Gm-Gg: AY/fxX6guz/nq1bmXeC4C+qQASdXt6ZrPGn5ysVho3Ct2Gb7gQZ7dYN2ymlPf7eqIkm BdKFBGJvxYH/kjMf0Gv8IfL2EiTdmYSZ/n7LC9BYLrGUflMBz7grY9v4ScS0Zlz5zYuS9VTzu/p oIa3ttzH7bbwnT0/MDd29bYKFk6r7K34ZKqaoxaTQHOJE2/FqPkh8nriV8DWou9vePPGE7GdRNW cYmhA4j6jM0tfupFqeI5sMpQp7vuQCJ0+wJSyNO6x6UDKRBbhOsdIyWc75yPbD8fynq3EGHUmRh bH+vZoQ= X-Google-Smtp-Source: AGHT+IEZPAEWt5Un0Rd/B+67FoINr31L7ZP4BbY877i4YA4aWNNabeeE8n+Mtb95AXcImBCGdyMe4++7UYoqvVvfexw= X-Received: by 2002:a17:907:8709:b0:b80:11fd:793b with SMTP id a640c23a62f3a-b8011fd7eaemr326734166b.19.1766071100897; Thu, 18 Dec 2025 07:18:20 -0800 (PST) MIME-Version: 1.0 References: <2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek> In-Reply-To: From: Melanie Plageman Date: Thu, 18 Dec 2025 10:18:09 -0500 X-Gm-Features: AQt7F2rybPQteUobarwe5O1mKLm6_lYEER_vayff3qIJns27quK7EvEFeSdFG_A Message-ID: Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) To: Kirill Reshke Cc: Andres Freund , Robert Haas , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas , Chao Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Dec 18, 2025 at 3:55=E2=80=AFAM Kirill Reshke wrote: > > On Thu, 18 Dec 2025 at 05:30, Melanie Plageman > wrote: > > > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are > > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD > > causing us to vacuum an all-frozen empty page. > > Yes, vacuum (disable_page_skipping); Ah, right, that would be a reliable way for it to happen. > > Then the question is, why wouldn't we have coverage of the empty page > > first being set all-visible/all-frozen? It can't be COPY FREEZE > > because the page is empty. And it can't be vacuum, because then we > > would have coverage. It's very mysterious. <--snip--> > I am currently inclined to think that we cannot see an empty page that > has PD_ALL_VISIBLE not-set. This is because when we make a page empty, > we are in a critical section, and we WAL-log everything we do, so our > changes should not be half-made. Maybe as of 608195a3a365, there was a > case with empry-page-without-PD_ALL_VISIBLE, but I dont think this > happens on HEAD. Right, so the way that empty pages get set PD_ALL_VISIBLE is when a page has all its tuples deleted, the next time it is vacuumed it will be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if it's a trailing page it will be truncated, but any non-trailing page will be like this). But you are right, I don't see any non-error code path where a heap page would become empty (all line pointers set unused) and then not be set all-visible. Only vacuum sets line pointers unused and if all the line pointers are unused it will always set the page all-visible. I think, though, that if we error out in lazy_scan_prune() after returning from heap_page_prune_and_freeze() such that we don't set the empty page all-visible, we can end up with an empty page without PD_ALL_VISIBLE set. You can see how this might work by patching the VM set code in lazy_scan_prune() to skip empty pages. > I did small archeology and this "if (PageIsEmpty(page)) { if > (!PageIsAllVisible(page)) { .... }}" code originates back to > 608195a3a365. Comment about not WAL-logged relation extension is from > a6370fd9ed3d, and I don't think we need to think about this case. Thanks for looking into this. Even if this code was added to handle the error codepath I mentioned above, it seems like it would have been good enough to just let lazy_scan_prune() handle setting the empty page all-visible the next time the page was vacuumed. Since there is no non-error code path where this can happen, it doesn't seem like it would merit its own special case. It is possible it was more common as of 608195a3a365, as you say. I don't understand how the bug fixed by a6370fd9ed3d can happen. When a new page is initialized, flags are set to 0, so regardless of WAL logging of the extension not happening, how would the new page have been set PD_ALL_VISIBLE? We'll have to ask Andres or Robert about how this was hit. > Also, after the whole set is committed, we should then never > experience discrepancy between PD_ALL_VISIBLE and VM bits? Because > they will be set in a single WAL record. The only cases when heap and > VM disagrees on all-visibility then are corruption, > pg_visibilitymap_truncate and old data (data before v19+ upgrade?) > If my understanding is correct, should we add document this? Even on current master, I don't see a scenario other than VM corruption or truncation where PD_ALL_VISIBLE can be set but not the VM (or vice versa). The only way would be if you error out after setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE is not in a critical section in lazy_scan_prune(), so it won't panic and dump shared memory, so the buffer with PD_ALL_VISIBLE set may later get written out. But the only obvious way I see to error out of MarkBufferDirty() is if the buffer is not valid -- which would have kept us from doing previous operations on the buffer, I would think. It's true this will no longer happen after my patches, as PageSetAllVisible() will happen in a critical section. We could add a comment about this particular scenario in the code somewhere. But I don't think we should document it in any user-facing documentation since you could still truncate the VM and have the two out of sync. - Melanie